Opened 16 months ago

Closed 4 months ago

Last modified 4 months ago

#4327 closed change (fixed)

[abp2blocklist] Prevent $subdocument filters from blocking top-level documents

Reported by: kzar Assignee: mjethani
Priority: P3 Milestone:
Module: Platform Keywords: abp2blocklist, AcceptableAds
Cc: sebastian, mario, philll, mjethani, BrentM Blocked By:
Blocking: #4321, #5464 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29349989/
https://codereview.adblockplus.org/29457578/
https://codereview.adblockplus.org/29467585/

Description (last modified by mjethani)

Background

Content Blocking rules cannot distinguish between top-level documents and sub-documents in the way that Adblock Plus filters can. This means that filters designed only to block sub-documents are converted into Content Blocking rules that end up blocking both sub-documents and top-level documents. These rules can cause false positives and break whitelisted acceptable adverts. (For example see #4280.)

What to change

For any blocking rule without a domain part remove the "document" resource-type. If the rule only has the "document" resource-type then it should simply be removed.

Where the URL filter does contain the domain part, add an exception for top-level URLs using Safari 11's unless-top-url property, but only if the rule has no if-domain or unless-domain property.

Notes

This will unfortunately reduce the effectiveness of the Content Blocking rules generated, but until Apple provides us the power to distinguish between top-level documents and sub-documents we are left with no choice. The new unless-top-url property, which only works in Safari 11 and onwards, is close even though not exactly what is needed. Once Safari 11 becomes the minimum supported version, unless-top-url should be used in all such cases to prevent the blocking of top-level URLs.

Change History (25)

comment:1 Changed 16 months ago by kzar

Thinking about it I don't know how good an idea this is, since specific blocking rules have the same problem. What's to stop a blocking rule that matches a domain wrongly matching a request for a top-level document?

comment:2 Changed 16 months ago by kzar

  • Owner kzar deleted

comment:3 Changed 15 months ago by kzar

  • Owner set to kzar
  • Review URL(s) modified (diff)

In case we do want to go ahead with this I've got the change ready.

I'll also send Mario an example easylist+exceptionrules_content_blocker.json file so he can test if these changes would help.

comment:4 Changed 15 months ago by kzar

  • Cc philll added
  • Status changed from new to reviewing

I still fear even allowing blocking rules with a hostname part to block document requests might be too much. For example take this filter that's in EasyList:

||com/banners/$image,object,subdocument

That will block document requests for any subdomain of .com, even though technically a hostname is included "com":

[
	{
		"trigger": {
			"url-filter": "^https?://([^/]+\\.)?com/banners/",
			"resource-type": [
				"image",
				"media",
				"document"
			]
		},
		"action": {
			"type": "block"
		}
	}
]

Anyway I'll let you guys do some testing, attaching two files:

  1. easylist+exceptionrules_content_blocker_issue_4327_specific_only.json.zip
  2. easylist+exceptionrules_content_blocker_issue_4327_none_at_all.json.zip

The first has been generated as this issue describes, rules generated without a hostname part will not block "document" requests.

The second allows no rules at all to block "document" requests.

It would be really useful to see how much they help to reduce Acceptable Ads false positives. Then we'll have enough information to decide how to proceed with this change.

comment:5 Changed 15 months ago by mario

@kzar, it seems like you attached the file easylist+exceptionrules_content_blocker_issue_4327_specific_only.json.zip twice. Would you mind adding easylist+exceptionrules_content_blocker_issue_4327_none_at_all.json.zip as well? Thanks!

comment:6 Changed 15 months ago by kzar

Whoops so I did, there you go.

comment:7 Changed 11 months ago by sebastian

  • Milestone set to Adblock-Plus-for-Chrome-Opera-next

comment:8 Changed 10 months ago by sebastian

  • Milestone Adblock-Plus-for-Chrome-Opera-next deleted

comment:9 Changed 6 months ago by kzar

  • Cc mjethani added
  • Owner kzar deleted

Unassigning myself from this and adding Manish to Cc since he's been doing a lot of good work with abp2blocklist recently.

For reference Manish we weren't sure if this change was a good idea, we just hoped it would help reduce false positives. The aim was to be able to trust the generated rules enough that we could refresh them automatically.

comment:10 Changed 6 months ago by mjethani

I think this is a good idea, for what it's worth.

comment:11 Changed 6 months ago by kzar

Sweet it sounds like Apple are finally adding what we need to Safari 11

Added if-top-url and unless-top-url, new triggers that execute when a regular expression is matched against the entirety of the main document URL.

I guess we should use those instead of removing the document type from generic request blocking rules?

comment:12 Changed 6 months ago by mjethani

Yes, we could simply repeat the URL pattern in the unless-top-url list to restrict the blocking to subdocuments only:

{
  "trigger": {
    "url-filter": "^https?://.*/foo",
    "resource-type": ["document"],
    "unless-top-url": ["^https?://.*/foo"]
  },
  "action": {
    "type": "block"
  }
}

This will not work unfortunately if there's also a domain specified, since if-domain and unless-top-url are mutually exclusive, but in that case we can simply remove the document resource type from the list.

comment:13 Changed 6 months ago by mjethani

  • Owner set to mjethani
  • Review URL(s) modified (diff)

comment:14 Changed 6 months ago by sebastian

Let's say the top-level document has the URL https://example.com/foo and there is a sub-frame with the URL https://example.com/bar. Now the filter ||example.com$subdocument should block only the frame. Currently, the top-level document would be blocked as well. But if we duplicate the pattern in url-filter and unless-top-url nothing would be blocked there. So either way the behavior is wrong, not to mention the issue mentioned above when there is a $domain filter option. On the other hand, the false positives we had here were quite bad, so if we can get rid of them I guess we have to accept some false negatives.

comment:15 Changed 6 months ago by abpbot

comment:16 Changed 5 months ago by mjethani

But if we duplicate the pattern in url-filter and unless-top-url nothing would be blocked there.

I think it's OK that subdocuments don't get blocked because of this, but this will also unblock other resource types. If we want to be proper, we should generate two rules here, a separate one for the $subdocument, and apply this only to that rule.

comment:17 Changed 5 months ago by mjethani

  • Review URL(s) modified (diff)

comment:18 Changed 5 months ago by sebastian

  • Cc BrentM added

comment:19 Changed 5 months ago by abpbot

comment:20 Changed 4 months ago by mjethani

  • Description modified (diff)
  • Summary changed from [abp2blocklist] Remove "document" type from generic request blocking rules to [abp2blocklist] Prevent $subdocument filters from blocking top-level documents

comment:21 Changed 4 months ago by kzar

  • Priority changed from Unknown to P3
  • Ready set
  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:22 Changed 4 months ago by mjethani

  • Blocking 5464 added
Note: See TracTickets for help on using tickets.