Opened 7 months ago

Closed 6 months ago

Last modified 4 months ago

#6690 closed defect (fixed)

Trailing periods are not treated consistently between filter types

Reported by: kzar Assignee: mjethani
Priority: P3 Milestone:
Module: Core Keywords:
Cc: mjethani, sergz, sebastian, greiner Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Ross Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29790629/

Description (last modified by mjethani)

Environment

adblockpluschrome built from 939bf6cdd435. Chrome 66.

How to reproduce

  1. Append this code to the end of adblockpluschrome/adblockpluscore/lib/filterClasses.js:
console.log(Filter.fromText("com.###id").domains);
console.log(Filter.fromText("foo$domain=com.").domains);
  1. Have a look in the console at the logged domains.

Observed behaviour

Map(2) {"" => false, "COM." => true}
Map(2) {"" => false, "COM" => true}

Notice that the element hiding filter domain has a trailing dot still, but not the blocking filter.

Expected behaviour

The trailing period would be treated consitently for element hiding and blocking filters.

Notes

  • This has some implications on our code that needs to know if filter domains can ever have trailing dots. See the discussion in this codereview for example.
  • Our users have asked in the past that filters for the domain without the trailing dot should apply to requests to the domain with the trailing dot. See #4302.

Resolution

To resolve this issue, the following changes have been made for both URL request blocking and element hiding:

  1. If a filter contains a trailing dot in the domain name, Adblock Plus does not strip the trailing dot out (i.e. in example.com.##.foo the domain is example.com.)
  2. If the domain of a document contains a trailing dot, the trailing dot is always stripped out (i.e. in https://example.com./foo the domain is example.com)

As a result, filters containing a trailing dot in the domain name will no longer match the document's domain name (i.e. example.com.##.foo will no longer apply on either https://example.com/foo or https://example.com./foo), whereas filters containing no trailing dot in the domain name will match the document's domain name both with and without a trailing dot.

Note that this does not affect the URL pattern in a blocking filter. i.e. https://example.com/foo.js$domain=example.com still will not block the URL https://example.com./foo.js no matter what the document's domain.

Hints for testers

Test that the new behavior described in the previous section works correctly.

Change History (22)

comment:1 Changed 7 months ago by kzar

  • Blocking 6091 removed

On second thoughts this isn't really blocking #6091 unless we decide the trailing dot should always be stripped.

comment:2 Changed 7 months ago by kzar

  • Description modified (diff)

comment:3 Changed 7 months ago by kzar

  • Description modified (diff)

comment:4 Changed 7 months ago by kzar

  • Description modified (diff)

comment:5 follow-ups: Changed 7 months ago by mjethani

The trailing dot is significant as far as the browser is concerned. example.com and example.com. are two different hosts. This makes sense.

In practice though, example.com will almost always translate to example.com. at the DNS level. For the purpose of filter matching, here's how it should work ideally:

  1. If the filter includes the trailing dot in the domain, it should only match the document's host name if it's an FQDN
  2. If the filter does not include the trailing dot in the domain, it should match the document's host name regardless of whether it's an FQDN

So example.com.###foo should apply only to example.com. (which would ensure it never applies to a relative domain, e.g. a local website on your office network that happens to come from example.com.localdomain), whereas example.com###foo should apply to both example.com. (absolute) and example.com (relative).

This would be ideal.

I doubt that the former use case has any relevance to Adblock Plus though (i.e. writing a filter to explicitly prevent matching any relative domains), and the blocking filters seem to ignore the trailing dot anyway, so in my opinion we should go ahead and ignore trailing dot even in the case of hiding filters.

@kzar what do you think?

comment:6 in reply to: ↑ 5 Changed 7 months ago by mjethani

Replying to mjethani:

[...]

I doubt that the former use case has any relevance to Adblock Plus though (i.e. writing a filter to explicitly prevent matching any relative domains), and the blocking filters seem to ignore the trailing dot anyway, so in my opinion we should go ahead and ignore trailing dot even in the case of hiding filters.

To clarify, this still means we'll have to make sure example.com###foo matches both example.com and example.com., also for blocking filters.

comment:7 in reply to: ↑ 5 Changed 7 months ago by kzar

Replying to mjethani:

...so in my opinion we should go ahead and ignore trailing dot even in the case of hiding filters.

I agree, and IIRC that's also how uBlock behaves.

If we went with this approach I guess we'd consistently strip the trailing dots from the filter domains and then be careful to make sure the filters match like we expect. I guess my main concern is how this would be implemented in practice.

comment:8 follow-up: Changed 7 months ago by mjethani

The decision to ignore the trailing dot in domain names goes back to changeset b4fd9e7b2ed0. For some reason, this was done deliberately only for blocking filters and not for element hiding filters. There may have been a good reason for it, but it doesn't make sense to me as things are. I wish I could go back and look at "topic 9548" to get a better idea of why it was done this way.

Anyway, as for the implementation, it seems like it would just be a matter of flipping the ignoreTrailingDot flag.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 7 months ago by greiner

Replying to mjethani:

I wish I could go back and look at "topic 9548" to get a better idea of why it was done this way.

Here you go: https://adblockplus.org/forum/viewtopic.php?f=1&t=9548

comment:10 in reply to: ↑ 9 Changed 7 months ago by mjethani

Replying to greiner:

Replying to mjethani:

I wish I could go back and look at "topic 9548" to get a better idea of why it was done this way.

Here you go: https://adblockplus.org/forum/viewtopic.php?f=1&t=9548

Oh, that's awesome!

Alas the discussion there doesn't explain the rationale for dropping the trailing dot only for blocking filters. Let's use our own judgement to make the call then. If no one disagrees, I would like to drop the trailing dot in hiding filters and let the filter apply on a domain even if the domain contains the trailing dot.

comment:11 Changed 7 months ago by mjethani

  • Priority changed from Unknown to P3
  • Ready set

comment:12 Changed 7 months ago by kzar

Sounds good to me, the only other option is to ask Wladimir but I'd rather not bother him.

comment:13 Changed 7 months ago by sebastian

Treating domains in element hiding filters and in the $domain option of blocking filters consistently seems to be a good idea. However, I'm not sure about the benefit of stripping the trailing dot from domains given in filters. (Why would a filter list author add the trailing dot in the first place? Is there any filter list that actually does that?). Perhaps we should just leave the domain given in any filter alone.

But what is important is that specific filters for example.com are also applied if the document is loaded from example.com.. This currently isn't the case for element hiding filters, and we should fix that.

comment:14 Changed 7 months ago by mjethani

You're right, I couldn't find any filters with trailing dots in EasyList+AA:

[...Filter.knownFilters.values()]
.filter(f => ([...f.domains && f.domains.keys() || []]).some(d => /\.$/.test(d)))

Since we've been simplifying the code anyway, perhaps it's a good idea to get rid of the dropping of trailing dots as well. This means filters that use trailing dots will no longer work as expected, but that appears not to be an issue in practice (similar to #6647).

comment:15 in reply to: ↑ 5 ; follow-up: Changed 7 months ago by mjethani

Replying to mjethani:

So example.com.###foo should apply only to example.com. (which would ensure it never applies to a relative domain, e.g. a local website on your office network that happens to come from example.com.localdomain), whereas example.com###foo should apply to both example.com. (absolute) and example.com (relative).

This would be ideal.

I realize that based on Sebastian's suggestion above, we would end up with this "ideal" behavior (in my opinion) for free. Nice!

comment:16 Changed 7 months ago by mjethani

  • Owner set to mjethani

comment:17 in reply to: ↑ 15 Changed 7 months ago by mjethani

Replying to mjethani:

Replying to mjethani:

So example.com.###foo should apply only to example.com. (which would ensure it never applies to a relative domain, e.g. a local website on your office network that happens to come from example.com.localdomain), whereas example.com###foo should apply to both example.com. (absolute) and example.com (relative).

This would be ideal.

I realize that based on Sebastian's suggestion above, we would end up with this "ideal" behavior (in my opinion) for free. Nice!

Sorry again, this is not true if we want to keep it efficient. A domain with a trailing dot in the filter will basically be ignored for the purpose of matching, as if it didn't exist in the filter. This is OK as we all agree we don't care about domains with trailing dots in filters.

comment:18 Changed 7 months ago by mjethani

  • Review URL(s) modified (diff)

comment:19 Changed 6 months ago by abpbot

A commit referencing this issue has landed:
Issue 6690 - Always ignore trailing dot in document domain

comment:20 Changed 6 months ago by mjethani

  • Resolution set to fixed
  • Status changed from new to closed

comment:21 Changed 6 months ago by mjethani

  • Description modified (diff)

comment:22 Changed 4 months ago by Ross

  • Tester changed from Unknown to Ross
  • Verified working set

Working as described.

ABP 3.2.0.2103
Chrome 68 / 55 / 49 / Windows 10
Firefox 61 / 55 / 51 / Windows 10

Note: See TracTickets for help on using tickets.