Opened on 01/08/2018 at 04:56:55 PM
Closed on 08/29/2019 at 05:43:52 PM
Last modified on 01/06/2020 at 03:47:53 PM
#6234 closed defect (rejected)
Filter in which pattern contains hash parsed as blocking filter
Reported by: | hfiguiere | Assignee: | |
---|---|---|---|
Priority: | Unknown | Milestone: | |
Module: | Core | Keywords: | closed-in-favor-of-gitlab |
Cc: | kzar, mapx, sergz, erikvold, mjethani, sebastian, greiner | Blocked By: | |
Blocking: | Platform: | Unknown / Cross platform | |
Ready: | no | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by kzar)
Environment
Adblock Plus incorrectly parse some malformed filter as Blocking Filters.
How to reproduce
- Add filter planet.fr#@?#:-abp-properties(cursor:pointer;)
Observed behaviour
Filter is treated as a Blocking Filter.
Expected behaviour
It should be an Invalid Filter.
Note
- There is no formal way to test this in the UI though.
- This comes from issue #6193.
- Fixing this should include proper test case.
Attachments (0)
Change History (20)
comment:1 Changed on 01/08/2018 at 05:26:28 PM by mapx
- Cc kzar mapx added
comment:3 Changed on 01/10/2018 at 06:40:09 PM by hfiguiere
- Owner set to hfiguiere
comment:4 Changed on 01/15/2018 at 02:20:25 PM by hfiguiere
- Owner hfiguiere deleted
comment:5 Changed on 11/15/2018 at 05:26:04 AM by erikvold
- Cc erikvold added
comment:6 Changed on 03/17/2019 at 08:18:17 AM by mjethani
comment:7 Changed on 03/17/2019 at 08:18:35 AM by mjethani
- Cc mjethani added
comment:8 Changed on 03/19/2019 at 10:08:19 PM by hfiguiere
This exactly what's wrong. It's not even a blocking filter because of the # there.
The design of the parser doesn't make this easy to fix though.
comment:9 Changed on 03/20/2019 at 01:13:31 AM by mjethani
- Cc sebastian greiner added
comment:10 Changed on 03/21/2019 at 12:45:54 PM by greiner
If the suggestion is to not consider any filters that contain # as a blocking/exception filter, I'd tend to agree with it. However, we should then also double-check how/if we currently handle the hash part of the URL to make sure we're not taking away/limiting functionality by prohibiting the use of # in filter patterns.
comment:11 follow-up: ↓ 14 Changed on 03/21/2019 at 09:01:50 PM by sebastian
Request filters are always matched against the URL without the hash/fragment part.
comment:12 Changed on 03/22/2019 at 08:21:22 AM by mjethani
- Summary changed from Incorrect filter syntax parsed as blocking filter to Filter in which pattern contains hash parsed as blocking filter
comment:13 Changed on 03/22/2019 at 08:31:22 AM by mjethani
I see at least the following URL request filters in EasyList containing a hash:
/gelbooru\.com.*[a-zA-Z0-9?!=@%#]{40,}/$image,other ||linkshrink.net^*#^$popup,domain=linkshrink.net /https?:\/\/.*[=|&|%|#|+].*/$popup,domain=0123movies.com|1337x.to|... /http://[a-zA-Z0-9]+\.[a-z]+\/.*(?:[!"#$%&'()*+,:;<=>?@/\^_`{|}~-]).*[a-zA-Z0-9]+/$script,third-party,domain=keezmovies.com|redtube.com|... ||h2porn.com/*.js#spot= /http*.:\/\/.*[?|=|&|%|#|+].*/$popup,domain=femefun.com @@.gif#$domain=budget101.com|cbox.ws|... @@.ico#$domain=xup.in|xup.to @@.jpg#$domain=300mbfilms.co|apkone.net|... @@.png#$domain=300mblink.com|adlipay.com|... @@.png?*#$domain=mypapercraft.net|xlocker.net @@||blogspot.com^*#-$image,domain=cricket-365.pw|cricpower.com|... @@||free.smsmarkaz.urdupoint.com^*#-$image @@||i.imgur.com^*#.$image,domain=newmusicforpeople.org @@||linkcrypt.ws/image/*#$image @@||media.eventhubs.com/images/*#$image @@||mzstatic.com^*.jpg#$image,domain=newmusicforpeople.org @@||seekingalpha.com/adsframe.html#que=$subdocument @@||teenidols4you.com^*#-$image @@||youwatch.org^*#$image @@.jpg#$domain=hellojav.com|hentaienespañol.net|palaotog.net @@.png#$domain=indiangilma.com|javcen.me|...
Not all are regular expressions.
If we are sure that at least plain URL patterns do not need a hash, then we could do one of the following:
- Disallow the hash entirely
- Allow a hash if it's a regular expression
Either way, we need to inform filter list authors if we make this change.
comment:14 in reply to: ↑ 11 ; follow-up: ↓ 15 Changed on 03/22/2019 at 01:32:35 PM by greiner
Replying to sebastian:
Request filters are always matched against the URL without the hash/fragment part.
I've now tried the following:
- Add custom filter #foo
- Go to a http://example.com/
- Run the following code in the JavaScript console fetch("bar.json#foo")"
As it turns out, the request is being blocked by the filter based on its URL hash. Whether or not that's intentional is not clear but at least I'm not seeing anything in the code that's supposed to strip it away.
That being said, I don't know what value it would have to match a request based on its hash so we should carefully look into that before deciding to remove this feature. For example, while the hash may not lead to a different request being made or response, it may help filter authors as it could contain distinct information.
comment:15 in reply to: ↑ 14 Changed on 03/22/2019 at 01:43:21 PM by mjethani
Replying to greiner:
For example, while the hash may not lead to a different request being made or response, it may help filter authors as it could contain distinct information.
The hash may be valuable for subframes that load different content based on the hash, for example. It could be valuable for popups. If we have been using the hash part then it makes sense to leave it working as it does, in my opinion.
comment:16 Changed on 03/22/2019 at 11:36:04 PM by sebastian
Thomas is right. I didn't think that we were stripping the hash/fragement part but assumed the webRequest API would report URLs as sent to the server (i.e. without hash/fragement part). But that is apparently not true. Just tested as well with images and iframes, and any given hash part is given in the URL when matched against our filters.
comment:17 Changed on 03/24/2019 at 12:50:59 PM by mjethani
Currently the following filters parse as element hiding filters:
example.com^##hash example.com?##hash
But not this one:
example.com/##hash
As part of this work we need to fix our parsing of content filters (element hiding filters, element hiding emulation filters, snippet filters, and exceptions).
Here then is a possible solution:
- If the text to the left of the first # contains only characters that can occur in a hostname and it also matches the regular expression for content filters, consider it a content filter; example.com#@?#hash should parse as invalid in this case
- If the text to the left of the first # contains any characters that cannot occur in a hostname, parse it as a request filter
This should cover most cases except the following kind:
ad.png#@?#hash
Note that this would still work for exceptions, because an @ cannot occur in a hostname.
@@ad.png#@?#hash
Looking at the EasyList filters I listed above, it seems that all would parse as request filters or exceptions while the example in the issue description would still be parsed as an invalid content filter.
comment:18 Changed on 04/09/2019 at 11:54:10 AM by greiner
As suggested in the ticket description, we could also expose more information about a filter in the UI. Because no matter how good our parsing ends up being, it's not obvious to everyone that example.com##foo hides elements whereas example.com#foo or ||example.com##foo blocks requests. I've created ui#397 so that we can work on that in parallel.
comment:19 Changed on 08/29/2019 at 05:43:52 PM by sebastian
- Keywords closed-in-favor-of-gitlab added
- Resolution set to rejected
- Status changed from new to closed
Sorry, but we switched to GitLab. If this issue is still relevant, please file it again in the new issue tracker.
Why is this not a blocking filter? It does not match the regular expression for content filters, therefore it is a blocking filer by definition.