Opened on 01/08/2018 at 04:56:55 PM

Closed on 08/29/2019 at 05:43:52 PM

Last modified on 01/06/2020 at 03:47:53 PM

#6234 closed defect (rejected)

Filter in which pattern contains hash parsed as blocking filter

Reported by: hfiguiere Assignee:
Priority: Unknown Milestone:
Module: Core Keywords: closed-in-favor-of-gitlab
Cc: kzar, mapx, sergz, erikvold, mjethani, sebastian, greiner Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by kzar)

Environment

Adblock Plus incorrectly parse some malformed filter as Blocking Filters.

How to reproduce

  1. Add filter planet.fr#@?#:-abp-properties(cursor:pointer;)

Observed behaviour

Filter is treated as a Blocking Filter.

Expected behaviour

It should be an Invalid Filter.

Note

  • There is no formal way to test this in the UI though.
  • This comes from issue #6193.
  • Fixing this should include proper test case.

Attachments (0)

Change History (20)

comment:1 Changed on 01/08/2018 at 05:26:28 PM by mapx

  • Cc kzar mapx added

comment:2 Changed on 01/09/2018 at 12:06:12 PM by kzar

  • Cc sergz added
  • Description modified (diff)

comment:3 Changed on 01/10/2018 at 06:40:09 PM by hfiguiere

  • Owner set to hfiguiere

comment:4 Changed on 01/15/2018 at 02:20:25 PM by hfiguiere

  • Owner hfiguiere deleted

comment:5 Changed on 11/15/2018 at 05:26:04 AM by erikvold

  • Cc erikvold added

comment:6 Changed on 03/17/2019 at 08:18:17 AM by mjethani

planet.fr#@?#:-abp-properties(cursor:pointer;)

Why is this not a blocking filter? It does not match the regular expression for content filters, therefore it is a blocking filer by definition.

comment:7 Changed on 03/17/2019 at 08:18:35 AM by mjethani

  • Cc mjethani added

comment:8 Changed on 03/19/2019 at 10:08:19 PM by hfiguiere

This exactly what's wrong. It's not even a blocking filter because of the # there.

The design of the parser doesn't make this easy to fix though.

comment:9 Changed on 03/20/2019 at 01:13:31 AM by mjethani

  • Cc sebastian greiner added

comment:10 Changed on 03/21/2019 at 12:45:54 PM by greiner

If the suggestion is to not consider any filters that contain # as a blocking/exception filter, I'd tend to agree with it. However, we should then also double-check how/if we currently handle the hash part of the URL to make sure we're not taking away/limiting functionality by prohibiting the use of # in filter patterns.

comment:11 follow-up: Changed on 03/21/2019 at 09:01:50 PM by sebastian

Request filters are always matched against the URL without the hash/fragment part.

comment:12 Changed on 03/22/2019 at 08:21:22 AM by mjethani

  • Summary changed from Incorrect filter syntax parsed as blocking filter to Filter in which pattern contains hash parsed as blocking filter

comment:13 Changed on 03/22/2019 at 08:31:22 AM by mjethani

I see at least the following URL request filters in EasyList containing a hash:

/gelbooru\.com.*[a-zA-Z0-9?!=@%#]{40,}/$image,other
||linkshrink.net^*#^$popup,domain=linkshrink.net
/https?:\/\/.*[=|&|%|#|+].*/$popup,domain=0123movies.com|1337x.to|...
/http://[a-zA-Z0-9]+\.[a-z]+\/.*(?:[!"#$%&'()*+,:;<=>?@/\^_`{|}~-]).*[a-zA-Z0-9]+/$script,third-party,domain=keezmovies.com|redtube.com|...
||h2porn.com/*.js#spot=
/http*.:\/\/.*[?|=|&|%|#|+].*/$popup,domain=femefun.com
@@.gif#$domain=budget101.com|cbox.ws|...
@@.ico#$domain=xup.in|xup.to
@@.jpg#$domain=300mbfilms.co|apkone.net|...
@@.png#$domain=300mblink.com|adlipay.com|...
@@.png?*#$domain=mypapercraft.net|xlocker.net
@@||blogspot.com^*#-$image,domain=cricket-365.pw|cricpower.com|...
@@||free.smsmarkaz.urdupoint.com^*#-$image
@@||i.imgur.com^*#.$image,domain=newmusicforpeople.org
@@||linkcrypt.ws/image/*#$image
@@||media.eventhubs.com/images/*#$image
@@||mzstatic.com^*.jpg#$image,domain=newmusicforpeople.org
@@||seekingalpha.com/adsframe.html#que=$subdocument
@@||teenidols4you.com^*#-$image
@@||youwatch.org^*#$image
@@.jpg#$domain=hellojav.com|hentaienespañol.net|palaotog.net
@@.png#$domain=indiangilma.com|javcen.me|...

Not all are regular expressions.

If we are sure that at least plain URL patterns do not need a hash, then we could do one of the following:

  1. Disallow the hash entirely
  2. Allow a hash if it's a regular expression

Either way, we need to inform filter list authors if we make this change.

Last edited on 03/22/2019 at 08:32:22 AM by mjethani

comment:14 in reply to: ↑ 11 ; follow-up: Changed on 03/22/2019 at 01:32:35 PM by greiner

Replying to sebastian:

Request filters are always matched against the URL without the hash/fragment part.

I've now tried the following:

  1. Add custom filter #foo
  2. Go to a http://example.com/
  3. Run the following code in the JavaScript console fetch("bar.json#foo")"

As it turns out, the request is being blocked by the filter based on its URL hash. Whether or not that's intentional is not clear but at least I'm not seeing anything in the code that's supposed to strip it away.

That being said, I don't know what value it would have to match a request based on its hash so we should carefully look into that before deciding to remove this feature. For example, while the hash may not lead to a different request being made or response, it may help filter authors as it could contain distinct information.

comment:15 in reply to: ↑ 14 Changed on 03/22/2019 at 01:43:21 PM by mjethani

Replying to greiner:

For example, while the hash may not lead to a different request being made or response, it may help filter authors as it could contain distinct information.

The hash may be valuable for subframes that load different content based on the hash, for example. It could be valuable for popups. If we have been using the hash part then it makes sense to leave it working as it does, in my opinion.

comment:16 Changed on 03/22/2019 at 11:36:04 PM by sebastian

Thomas is right. I didn't think that we were stripping the hash/fragement part but assumed the webRequest API would report URLs as sent to the server (i.e. without hash/fragement part). But that is apparently not true. Just tested as well with images and iframes, and any given hash part is given in the URL when matched against our filters.

comment:17 Changed on 03/24/2019 at 12:50:59 PM by mjethani

Currently the following filters parse as element hiding filters:

example.com^##hash
example.com?##hash

But not this one:

example.com/##hash

As part of this work we need to fix our parsing of content filters (element hiding filters, element hiding emulation filters, snippet filters, and exceptions).

Here then is a possible solution:

  1. If the text to the left of the first # contains only characters that can occur in a hostname and it also matches the regular expression for content filters, consider it a content filter; example.com#@?#hash should parse as invalid in this case
  2. If the text to the left of the first # contains any characters that cannot occur in a hostname, parse it as a request filter

This should cover most cases except the following kind:

ad.png#@?#hash

Note that this would still work for exceptions, because an @ cannot occur in a hostname.

@@ad.png#@?#hash

Looking at the EasyList filters I listed above, it seems that all would parse as request filters or exceptions while the example in the issue description would still be parsed as an invalid content filter.

comment:18 Changed on 04/09/2019 at 11:54:10 AM by greiner

As suggested in the ticket description, we could also expose more information about a filter in the UI. Because no matter how good our parsing ends up being, it's not obvious to everyone that example.com##foo hides elements whereas example.com#foo or ||example.com##foo blocks requests. I've created ui#397 so that we can work on that in parallel.

comment:19 Changed on 08/29/2019 at 05:43:52 PM by sebastian

  • Keywords closed-in-favor-of-gitlab added
  • Resolution set to rejected
  • Status changed from new to closed

Sorry, but we switched to GitLab. If this issue is still relevant, please file it again in the new issue tracker.

comment:20 Changed on 01/06/2020 at 03:47:53 PM by hfiguiere

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from (none).
 
Note: See TracTickets for help on using tickets.