Opened on 08/13/2016 at 04:35:56 PM

Closed on 07/12/2017 at 09:17:41 AM

Last modified on 08/02/2017 at 06:30:08 AM

#4327 closed change (fixed)

[abp2blocklist] Prevent $subdocument filters from blocking top-level documents

Reported by: kzar Assignee: mjethani
Priority: P3 Milestone:
Module: Platform Keywords: abp2blocklist, AcceptableAds
Cc: sebastian, mario, philll, mjethani, BrentM Blocked By:
Blocking: #4321, #5464 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29349989/
https://codereview.adblockplus.org/29457578/
https://codereview.adblockplus.org/29467585/

Description (last modified by mjethani)

Background

Content Blocking rules cannot distinguish between top-level documents and sub-documents in the way that Adblock Plus filters can. This means that filters designed only to block sub-documents are converted into Content Blocking rules that end up blocking both sub-documents and top-level documents. These rules can cause false positives and break whitelisted acceptable adverts. (For example see #4280.)

What to change

For any blocking rule without a domain part remove the "document" resource-type. If the rule only has the "document" resource-type then it should simply be removed.

Where the URL filter does contain the domain part, add an exception for top-level URLs using Safari 11's unless-top-url property, but only if the rule has no if-domain or unless-domain property.

Notes

This will unfortunately reduce the effectiveness of the Content Blocking rules generated, but until Apple provides us the power to distinguish between top-level documents and sub-documents we are left with no choice. The new unless-top-url property, which only works in Safari 11 and onwards, is close even though not exactly what is needed. Once Safari 11 becomes the minimum supported version, unless-top-url should be used in all such cases to prevent the blocking of top-level URLs.

Attachments (3)

easylist+exceptionrules_content_blocker_issue_4327_specific_only.json.zip (963.4 KB) - added by kzar on 08/19/2016 at 06:34:42 PM.
easylist+exceptionrules_content_blocker_issue_4327_specific_only.json.2.zip (963.4 KB) - added by kzar on 08/19/2016 at 06:34:52 PM.
easylist+exceptionrules_content_blocker_issue_4327_none_at_all.json.zip (958.4 KB) - added by kzar on 08/23/2016 at 08:35:40 AM.

Download all attachments as: .zip

Change History (25)

comment:1 Changed on 08/13/2016 at 07:05:16 PM by kzar

Thinking about it I don't know how good an idea this is, since specific blocking rules have the same problem. What's to stop a blocking rule that matches a domain wrongly matching a request for a top-level document?

comment:2 Changed on 08/15/2016 at 12:07:21 PM by kzar

  • Owner kzar deleted

comment:3 Changed on 08/19/2016 at 05:59:57 PM by kzar

  • Owner set to kzar
  • Review URL(s) modified (diff)

In case we do want to go ahead with this I've got the change ready.

I'll also send Mario an example easylist+exceptionrules_content_blocker.json file so he can test if these changes would help.

comment:4 Changed on 08/19/2016 at 06:34:05 PM by kzar

  • Cc philll added
  • Status changed from new to reviewing

I still fear even allowing blocking rules with a hostname part to block document requests might be too much. For example take this filter that's in EasyList:

||com/banners/$image,object,subdocument

That will block document requests for any subdomain of .com, even though technically a hostname is included "com":

[
	{
		"trigger": {
			"url-filter": "^https?://([^/]+\\.)?com/banners/",
			"resource-type": [
				"image",
				"media",
				"document"
			]
		},
		"action": {
			"type": "block"
		}
	}
]

Anyway I'll let you guys do some testing, attaching two files:

  1. easylist+exceptionrules_content_blocker_issue_4327_specific_only.json.zip
  2. easylist+exceptionrules_content_blocker_issue_4327_none_at_all.json.zip

The first has been generated as this issue describes, rules generated without a hostname part will not block "document" requests.

The second allows no rules at all to block "document" requests.

It would be really useful to see how much they help to reduce Acceptable Ads false positives. Then we'll have enough information to decide how to proceed with this change.

Changed on 08/19/2016 at 06:34:42 PM by kzar

Changed on 08/19/2016 at 06:34:52 PM by kzar

comment:5 Changed on 08/23/2016 at 07:37:13 AM by mario

@kzar, it seems like you attached the file easylist+exceptionrules_content_blocker_issue_4327_specific_only.json.zip twice. Would you mind adding easylist+exceptionrules_content_blocker_issue_4327_none_at_all.json.zip as well? Thanks!

Changed on 08/23/2016 at 08:35:40 AM by kzar

comment:6 Changed on 08/23/2016 at 08:36:19 AM by kzar

Whoops so I did, there you go.

comment:7 Changed on 01/07/2017 at 10:35:27 AM by sebastian

  • Milestone set to Adblock-Plus-for-Chrome-Opera-next

comment:8 Changed on 01/19/2017 at 09:26:03 AM by sebastian

  • Milestone Adblock-Plus-for-Chrome-Opera-next deleted

comment:9 Changed on 05/22/2017 at 09:24:55 AM by kzar

  • Cc mjethani added
  • Owner kzar deleted

Unassigning myself from this and adding Manish to Cc since he's been doing a lot of good work with abp2blocklist recently.

For reference Manish we weren't sure if this change was a good idea, we just hoped it would help reduce false positives. The aim was to be able to trust the generated rules enough that we could refresh them automatically.

comment:10 Changed on 05/31/2017 at 07:05:31 AM by mjethani

I think this is a good idea, for what it's worth.

comment:11 Changed on 06/05/2017 at 09:09:21 PM by kzar

Sweet it sounds like Apple are finally adding what we need to Safari 11

Added if-top-url and unless-top-url, new triggers that execute when a regular expression is matched against the entirety of the main document URL.

I guess we should use those instead of removing the document type from generic request blocking rules?

comment:12 Changed on 06/06/2017 at 04:26:32 AM by mjethani

Yes, we could simply repeat the URL pattern in the unless-top-url list to restrict the blocking to subdocuments only:

{
  "trigger": {
    "url-filter": "^https?://.*/foo",
    "resource-type": ["document"],
    "unless-top-url": ["^https?://.*/foo"]
  },
  "action": {
    "type": "block"
  }
}

This will not work unfortunately if there's also a domain specified, since if-domain and unless-top-url are mutually exclusive, but in that case we can simply remove the document resource type from the list.

comment:13 Changed on 06/06/2017 at 12:52:09 PM by mjethani

  • Owner set to mjethani
  • Review URL(s) modified (diff)

comment:14 Changed on 06/06/2017 at 01:34:03 PM by sebastian

Let's say the top-level document has the URL https://example.com/foo and there is a sub-frame with the URL https://example.com/bar. Now the filter ||example.com$subdocument should block only the frame. Currently, the top-level document would be blocked as well. But if we duplicate the pattern in url-filter and unless-top-url nothing would be blocked there. So either way the behavior is wrong, not to mention the issue mentioned above when there is a $domain filter option. On the other hand, the false positives we had here were quite bad, so if we can get rid of them I guess we have to accept some false negatives.

comment:15 Changed on 06/09/2017 at 07:31:51 AM by abpbot

comment:16 Changed on 06/16/2017 at 03:10:10 PM by mjethani

But if we duplicate the pattern in url-filter and unless-top-url nothing would be blocked there.

I think it's OK that subdocuments don't get blocked because of this, but this will also unblock other resource types. If we want to be proper, we should generate two rules here, a separate one for the $subdocument, and apply this only to that rule.

comment:17 Changed on 06/16/2017 at 04:12:00 PM by mjethani

  • Review URL(s) modified (diff)

comment:18 Changed on 06/28/2017 at 02:42:22 PM by sebastian

  • Cc BrentM added

comment:19 Changed on 07/11/2017 at 05:17:31 PM by abpbot

comment:20 Changed on 07/12/2017 at 09:11:01 AM by mjethani

  • Description modified (diff)
  • Summary changed from [abp2blocklist] Remove "document" type from generic request blocking rules to [abp2blocklist] Prevent $subdocument filters from blocking top-level documents

comment:21 Changed on 07/12/2017 at 09:17:41 AM by kzar

  • Priority changed from Unknown to P3
  • Ready set
  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:22 Changed on 08/02/2017 at 06:30:08 AM by mjethani

  • Blocking 5464 added

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from mjethani.
 
Note: See TracTickets for help on using tickets.