Opened on 07/04/2018 at 10:36:58 AM

Closed on 10/09/2019 at 08:27:51 PM

#6773 closed change (rejected)

Implement support for domain wildcards

Reported by: fanboy Assignee:
Priority: Unknown Milestone:
Module: Core Keywords: closed-in-favor-of-gitlab
Cc: hfiguiere, mjethani, kzar, mapx, greiner, sebastian, imreeil42@gmail.com Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description

Environment

Allowing use of domain wildcards is helpful when you're dealing multi-region domains like google (or often changing domains like pirate/porn sites). Currently google has 307 separate domains, and making an exemption is painful (and probably slow)

Implement it in a safe way, avoiding (of limit) possible exploiting/false positives.

In fact I requested this type feature in 2009; https://adblockplus.org/forum/viewtopic.php?f=4&t=3536&p=68197

How to reproduce

@@||site.com^$domain=~www.google.*
some.website.*##.element
some.website.*,other.website.*##.element

A few examples can be seen in uBo;

https://github.com/uBlockOrigin/uAssets/blob/master/filters/filters.txt

Attachments (0)

Change History (19)

comment:1 Changed on 07/05/2018 at 06:29:45 AM by mapx

  • Cc hfiguiere mjethani kzar mapx added
  • Summary changed from Implent support for domain wildcards to Implement support for domain wildcards

comment:2 follow-up: Changed on 07/10/2018 at 12:02:13 PM by greiner

  • Cc greiner added

From what I can think of, there are two ways to tackle this:

A.
Implement such a placeholder and make sure that it will only match TLDs and nothing else. That means google.* should match google.co.uk (probably also google.blogspot.com) but not google.example.com. Ideally, we'd be using a different placeholder than * though to avoid confusion with the existing * placeholder which represents any string.

B.
Introduce some kind of variable support to allow filter authors to make sure that non-Google owned domains such as google.realestate do not match by explicitly listing all TLDs once. Especially for exception rules, which cannot be overridden by blocking rules, I'd consider this important to consider.

e.g.

var googleTlds = ["at", "co.uk", "de", ...]
@@||site.com^$domain=~google.{googleTlds}
google.{googleTlds}##.element
||google.{googleTlds}^$domain=example.com
Last edited on 10/04/2018 at 11:38:45 AM by greiner

comment:3 in reply to: ↑ 2 ; follow-up: Changed on 07/11/2018 at 03:31:58 PM by mjethani

Replying to greiner:

Ideally, we'd be using a different placeholder than * though to avoid confusion with the existing * placeholder which represents any string.

The other * is in the URL pattern part of a blocking filter, while this should be in the domain part. We already have * in CSP filters, for example. This should not be confusing, in my opinion.

comment:4 Changed on 07/11/2018 at 03:41:28 PM by mjethani

We should also consider how this would map to Safari's content blocker rules. There are if-top-url and unless-top-url fields (URL patterns), which we already use in abp2blocklist as poor substitutes, so I would expect this not to affect the performance over there.

It is a bit of a challenge to implement the domain matching in an efficient manner.

comment:5 Changed on 10/04/2018 at 11:37:43 AM by mapx

  • Cc sebastian added
  • Type changed from defect to change

comment:6 Changed on 10/04/2018 at 02:09:13 PM by sebastian

Allowing filters like @@||foobar^$domain=google.* (with canonical wildcard semantics) might be a bad idea as in this example it would also match and inadvertently whilteslist content on google.evil.com.

Alternatively, we could match .* (or an other token) if given at the end of the domain only against the public suffix (like greiner suggested). But these semantics seem rather confusion/unexpected. Also currently the core is agnostic of the public suffix list, and the related logic would have to be moved from adblockpluschrome to adblockpluscore first.

So neither approach seems like a good idea to me, not to mention their performance implications.

comment:7 Changed on 10/04/2018 at 02:35:35 PM by mapx

Today's request from the norwegian list maintainer:
https://adblockplus.org/forum/viewtopic.php?p=181173#p181173

comment:8 follow-up: Changed on 10/04/2018 at 02:37:40 PM by imreeil42@gmail.com

I am decidedly in favour of this ticket being accepted, especially as I've based a fair few of my smaller informal lists around that feature for brevity's sake (Example).

My attention is mostly on example №2 in the OP, as I tend to mostly deal with element rules when I write and maintain my lists, with example №1 being merely(?) a nice bonus.

Of extra note for the curious, is that this would've worked wonders for not only Google, but also for entries for other sites like Amazon, eBay, Eurogamer, and Viaplay (A Nordic streaming service), all of which have several international domains each.

For simplicity's sake, my money is on making ABP's implementation of this conform to that of uBlock Origin, who does it exactly as in OP's reproduction examples as far as I can personally determine.

And thanks for the headsup to me about this ticket, @mapx. :)

comment:9 in reply to: ↑ 8 Changed on 10/04/2018 at 03:27:01 PM by greiner

Replying to imreeil42@…:

For simplicity's sake, my money is on making ABP's implementation of this conform to that of uBlock Origin, who does it exactly as in OP's reproduction examples as far as I can personally determine.

Based on uBlock Origin's documentation, they seem to be using the public suffix list for that so that would be consistent with the implementation I outlined in proposal (A) - except for the syntax difference, I suggested.

comment:10 in reply to: ↑ 3 Changed on 10/04/2018 at 03:49:25 PM by mjethani

Replying to mjethani:

Replying to greiner:

Ideally, we'd be using a different placeholder than * though to avoid confusion with the existing * placeholder which represents any string.

The other * is in the URL pattern part of a blocking filter, while this should be in the domain part. We already have * in CSP filters, for example. This should not be confusing, in my opinion.

Ah, I see now what you meant with that comment. Yes, it makes sense to use a different placeholder if we're matching only TLDs, for forward compatibility (in case we ever have to support the wildcard * in the future).

comment:11 Changed on 10/04/2018 at 03:54:03 PM by mjethani

The last time I looked into this, it was not practical to do this without hurting performance. Since then, we have minimized the amount of parsing of domain maps (#6815). We are in general working on some optimizations for both memory usage and performance (#7000). I think we should look into implementing support for this once again after we are done with those changes, because it just might become more practical to do this after the changes.

comment:12 Changed on 10/04/2018 at 04:02:30 PM by mapx

  • Cc imreeil42@gmail.com added

comment:13 follow-up: Changed on 10/04/2018 at 08:20:24 PM by sebastian

In case we decide to go with this approach, only supporting wildcards at the end of the domain, and only matching them again the TLD (or rather public suffix) part, we should move lib/tld.js and lib/publicSuffixList.js from adblockpluschrome to adblockpluscore. Furthermore, the latter is currently generated by buildtools (and updated before every release). That automation should then be done by an npm script within adblockpluscore (side note: The Python based buildtools are on the way out and are currently progressively migrated to npm scripts).

As a side effect, with the public suffix logic in core, we could also simplify the matcher interface, so that core can perform the third party check itself, rather than the calling code in the Web Extension.

As for the syntax, while I agree that it's technically not exactly a wildcard, given that uBlock Origin already implemented it using the * character, I'm in the favor of just doing the same.

comment:14 Changed on 10/05/2018 at 12:28:40 PM by mjethani

Regarding TLDs vs. public suffixes, #6939 might also benefit from it.

comment:15 Changed on 10/05/2018 at 12:31:31 PM by sebastian

I think it must be public suffixes, otherwise google.* wouldn't even match google.co.uk.

comment:16 Changed on 10/05/2018 at 12:37:20 PM by imreeil42@gmail.com

I second sebastian's statement, since there's a whole lot of national domains that allow or require the use of second-level domains (e.g. .co.uk, .com.md, .med.ee, and several hundred other examples, if not thousands).

Last edited on 10/05/2018 at 12:37:36 PM by imreeil42@gmail.com

comment:17 in reply to: ↑ 13 Changed on 10/05/2018 at 12:46:09 PM by mjethani

Replying to sebastian:

As a side effect, with the public suffix logic in core, we could also simplify the matcher interface, so that core can perform the third party check itself, rather than the calling code in the Web Extension.

This sounds like a good idea.

I agree it makes sense to make the wildcard match the public suffix part of the domain name.

comment:18 Changed on 10/09/2019 at 12:10:23 PM by greiner

  • Component changed from Unknown to Core

comment:19 Changed on 10/09/2019 at 08:27:51 PM by sebastian

  • Keywords closed-in-favor-of-gitlab added
  • Resolution set to rejected
  • Status changed from new to closed

Sorry, but we switched to GitLab. If this issue is still relevant, please file it again in the new issue tracker.

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from (none).
 
Note: See TracTickets for help on using tickets.