Opened on 08/13/2016 at 04:35:56 PM
Closed on 07/12/2017 at 09:17:41 AM
Last modified on 08/02/2017 at 06:30:08 AM
#4327 closed change (fixed)
[abp2blocklist] Prevent $subdocument filters from blocking top-level documents
Reported by: | kzar | Assignee: | mjethani |
---|---|---|---|
Priority: | P3 | Milestone: | |
Module: | Platform | Keywords: | abp2blocklist, AcceptableAds |
Cc: | sebastian, mario, philll, mjethani, BrentM | Blocked By: | |
Blocking: | #4321, #5464 | Platform: | Unknown / Cross platform |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
https://codereview.adblockplus.org/29349989/ |
Description (last modified by mjethani)
Background
Content Blocking rules cannot distinguish between top-level documents and sub-documents in the way that Adblock Plus filters can. This means that filters designed only to block sub-documents are converted into Content Blocking rules that end up blocking both sub-documents and top-level documents. These rules can cause false positives and break whitelisted acceptable adverts. (For example see #4280.)
What to change
For any blocking rule without a domain part remove the "document" resource-type. If the rule only has the "document" resource-type then it should simply be removed.
Where the URL filter does contain the domain part, add an exception for top-level URLs using Safari 11's unless-top-url property, but only if the rule has no if-domain or unless-domain property.
Notes
This will unfortunately reduce the effectiveness of the Content Blocking rules generated, but until Apple provides us the power to distinguish between top-level documents and sub-documents we are left with no choice. The new unless-top-url property, which only works in Safari 11 and onwards, is close even though not exactly what is needed. Once Safari 11 becomes the minimum supported version, unless-top-url should be used in all such cases to prevent the blocking of top-level URLs.
Attachments (3)
Change History (25)
comment:1 Changed on 08/13/2016 at 07:05:16 PM by kzar
comment:2 Changed on 08/15/2016 at 12:07:21 PM by kzar
- Owner kzar deleted
comment:3 Changed on 08/19/2016 at 05:59:57 PM by kzar
- Owner set to kzar
- Review URL(s) modified (diff)
In case we do want to go ahead with this I've got the change ready.
I'll also send Mario an example easylist+exceptionrules_content_blocker.json file so he can test if these changes would help.
comment:4 Changed on 08/19/2016 at 06:34:05 PM by kzar
- Cc philll added
- Status changed from new to reviewing
I still fear even allowing blocking rules with a hostname part to block document requests might be too much. For example take this filter that's in EasyList:
||com/banners/$image,object,subdocument
That will block document requests for any subdomain of .com, even though technically a hostname is included "com":
[ { "trigger": { "url-filter": "^https?://([^/]+\\.)?com/banners/", "resource-type": [ "image", "media", "document" ] }, "action": { "type": "block" } } ]
Anyway I'll let you guys do some testing, attaching two files:
- easylist+exceptionrules_content_blocker_issue_4327_specific_only.json.zip
- easylist+exceptionrules_content_blocker_issue_4327_none_at_all.json.zip
The first has been generated as this issue describes, rules generated without a hostname part will not block "document" requests.
The second allows no rules at all to block "document" requests.
It would be really useful to see how much they help to reduce Acceptable Ads false positives. Then we'll have enough information to decide how to proceed with this change.
Changed on 08/19/2016 at 06:34:42 PM by kzar
Changed on 08/19/2016 at 06:34:52 PM by kzar
comment:5 Changed on 08/23/2016 at 07:37:13 AM by mario
@kzar, it seems like you attached the file easylist+exceptionrules_content_blocker_issue_4327_specific_only.json.zip twice. Would you mind adding easylist+exceptionrules_content_blocker_issue_4327_none_at_all.json.zip as well? Thanks!
Changed on 08/23/2016 at 08:35:40 AM by kzar
comment:6 Changed on 08/23/2016 at 08:36:19 AM by kzar
Whoops so I did, there you go.
comment:7 Changed on 01/07/2017 at 10:35:27 AM by sebastian
- Milestone set to Adblock-Plus-for-Chrome-Opera-next
comment:8 Changed on 01/19/2017 at 09:26:03 AM by sebastian
- Milestone Adblock-Plus-for-Chrome-Opera-next deleted
comment:9 Changed on 05/22/2017 at 09:24:55 AM by kzar
- Cc mjethani added
- Owner kzar deleted
Unassigning myself from this and adding Manish to Cc since he's been doing a lot of good work with abp2blocklist recently.
For reference Manish we weren't sure if this change was a good idea, we just hoped it would help reduce false positives. The aim was to be able to trust the generated rules enough that we could refresh them automatically.
comment:10 Changed on 05/31/2017 at 07:05:31 AM by mjethani
I think this is a good idea, for what it's worth.
comment:11 Changed on 06/05/2017 at 09:09:21 PM by kzar
Sweet it sounds like Apple are finally adding what we need to Safari 11
Added if-top-url and unless-top-url, new triggers that execute when a regular expression is matched against the entirety of the main document URL.
I guess we should use those instead of removing the document type from generic request blocking rules?
comment:12 Changed on 06/06/2017 at 04:26:32 AM by mjethani
Yes, we could simply repeat the URL pattern in the unless-top-url list to restrict the blocking to subdocuments only:
{ "trigger": { "url-filter": "^https?://.*/foo", "resource-type": ["document"], "unless-top-url": ["^https?://.*/foo"] }, "action": { "type": "block" } }
This will not work unfortunately if there's also a domain specified, since if-domain and unless-top-url are mutually exclusive, but in that case we can simply remove the document resource type from the list.
comment:13 Changed on 06/06/2017 at 12:52:09 PM by mjethani
- Owner set to mjethani
- Review URL(s) modified (diff)
comment:14 Changed on 06/06/2017 at 01:34:03 PM by sebastian
Let's say the top-level document has the URL https://example.com/foo and there is a sub-frame with the URL https://example.com/bar. Now the filter ||example.com$subdocument should block only the frame. Currently, the top-level document would be blocked as well. But if we duplicate the pattern in url-filter and unless-top-url nothing would be blocked there. So either way the behavior is wrong, not to mention the issue mentioned above when there is a $domain filter option. On the other hand, the false positives we had here were quite bad, so if we can get rid of them I guess we have to accept some false negatives.
comment:15 Changed on 06/09/2017 at 07:31:51 AM by abpbot
A commit referencing this issue has landed:
Issue 4327 - Do not apply $subdocument filters to top-level documents
comment:16 Changed on 06/16/2017 at 03:10:10 PM by mjethani
But if we duplicate the pattern in url-filter and unless-top-url nothing would be blocked there.
I think it's OK that subdocuments don't get blocked because of this, but this will also unblock other resource types. If we want to be proper, we should generate two rules here, a separate one for the $subdocument, and apply this only to that rule.
comment:17 Changed on 06/16/2017 at 04:12:00 PM by mjethani
- Review URL(s) modified (diff)
comment:18 Changed on 06/28/2017 at 02:42:22 PM by sebastian
- Cc BrentM added
comment:19 Changed on 07/11/2017 at 05:17:31 PM by abpbot
A commit referencing this issue has landed:
Issue 4327 - Prevent filters with no hostname from blocking documents
comment:20 Changed on 07/12/2017 at 09:11:01 AM by mjethani
- Description modified (diff)
- Summary changed from [abp2blocklist] Remove "document" type from generic request blocking rules to [abp2blocklist] Prevent $subdocument filters from blocking top-level documents
comment:21 Changed on 07/12/2017 at 09:17:41 AM by kzar
- Priority changed from Unknown to P3
- Ready set
- Resolution set to fixed
- Status changed from reviewing to closed
comment:22 Changed on 08/02/2017 at 06:30:08 AM by mjethani
- Blocking 5464 added
Thinking about it I don't know how good an idea this is, since specific blocking rules have the same problem. What's to stop a blocking rule that matches a domain wrongly matching a request for a top-level document?