Opened on 05/22/2018 at 01:12:29 PM

Closed on 06/05/2018 at 04:51:05 AM

Last modified on 08/21/2018 at 11:40:21 AM

#6690 closed defect (fixed)

Trailing periods are not treated consistently between filter types

Reported by: kzar Assignee: mjethani
Priority: P3 Milestone:
Module: Core Keywords:
Cc: mjethani, sergz, sebastian, greiner Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Ross Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29790629/

Description (last modified by mjethani)

Environment

adblockpluschrome built from 939bf6cdd435. Chrome 66.

How to reproduce

  1. Append this code to the end of adblockpluschrome/adblockpluscore/lib/filterClasses.js:
console.log(Filter.fromText("com.###id").domains);
console.log(Filter.fromText("foo$domain=com.").domains);
  1. Have a look in the console at the logged domains.

Observed behaviour

Map(2) {"" => false, "COM." => true}
Map(2) {"" => false, "COM" => true}

Notice that the element hiding filter domain has a trailing dot still, but not the blocking filter.

Expected behaviour

The trailing period would be treated consitently for element hiding and blocking filters.

Notes

  • This has some implications on our code that needs to know if filter domains can ever have trailing dots. See the discussion in this codereview for example.
  • Our users have asked in the past that filters for the domain without the trailing dot should apply to requests to the domain with the trailing dot. See #4302.

Resolution

To resolve this issue, the following changes have been made for both URL request blocking and element hiding:

  1. If a filter contains a trailing dot in the domain name, Adblock Plus does not strip the trailing dot out (i.e. in example.com.##.foo the domain is example.com.)
  2. If the domain of a document contains a trailing dot, the trailing dot is always stripped out (i.e. in https://example.com./foo the domain is example.com)

As a result, filters containing a trailing dot in the domain name will no longer match the document's domain name (i.e. example.com.##.foo will no longer apply on either https://example.com/foo or https://example.com./foo), whereas filters containing no trailing dot in the domain name will match the document's domain name both with and without a trailing dot.

Note that this does not affect the URL pattern in a blocking filter. i.e. https://example.com/foo.js$domain=example.com still will not block the URL https://example.com./foo.js no matter what the document's domain.

Hints for testers

Test that the new behavior described in the previous section works correctly.

Attachments (0)

Change History (22)

comment:1 Changed on 05/22/2018 at 01:27:05 PM by kzar

  • Blocking 6091 removed

On second thoughts this isn't really blocking #6091 unless we decide the trailing dot should always be stripped.

comment:2 Changed on 05/22/2018 at 01:29:02 PM by kzar

  • Description modified (diff)

comment:3 Changed on 05/22/2018 at 01:40:46 PM by kzar

  • Description modified (diff)

comment:4 Changed on 05/22/2018 at 02:21:54 PM by kzar

  • Description modified (diff)

comment:5 follow-ups: Changed on 05/22/2018 at 10:46:02 PM by mjethani

The trailing dot is significant as far as the browser is concerned. example.com and example.com. are two different hosts. This makes sense.

In practice though, example.com will almost always translate to example.com. at the DNS level. For the purpose of filter matching, here's how it should work ideally:

  1. If the filter includes the trailing dot in the domain, it should only match the document's host name if it's an FQDN
  2. If the filter does not include the trailing dot in the domain, it should match the document's host name regardless of whether it's an FQDN

So example.com.###foo should apply only to example.com. (which would ensure it never applies to a relative domain, e.g. a local website on your office network that happens to come from example.com.localdomain), whereas example.com###foo should apply to both example.com. (absolute) and example.com (relative).

This would be ideal.

I doubt that the former use case has any relevance to Adblock Plus though (i.e. writing a filter to explicitly prevent matching any relative domains), and the blocking filters seem to ignore the trailing dot anyway, so in my opinion we should go ahead and ignore trailing dot even in the case of hiding filters.

@kzar what do you think?

comment:6 in reply to: ↑ 5 Changed on 05/22/2018 at 10:52:35 PM by mjethani

Replying to mjethani:

[...]

I doubt that the former use case has any relevance to Adblock Plus though (i.e. writing a filter to explicitly prevent matching any relative domains), and the blocking filters seem to ignore the trailing dot anyway, so in my opinion we should go ahead and ignore trailing dot even in the case of hiding filters.

To clarify, this still means we'll have to make sure example.com###foo matches both example.com and example.com., also for blocking filters.

comment:7 in reply to: ↑ 5 Changed on 05/23/2018 at 02:33:11 PM by kzar

Replying to mjethani:

...so in my opinion we should go ahead and ignore trailing dot even in the case of hiding filters.

I agree, and IIRC that's also how uBlock behaves.

If we went with this approach I guess we'd consistently strip the trailing dots from the filter domains and then be careful to make sure the filters match like we expect. I guess my main concern is how this would be implemented in practice.

comment:8 follow-up: Changed on 05/23/2018 at 07:53:59 PM by mjethani

The decision to ignore the trailing dot in domain names goes back to changeset b4fd9e7b2ed0. For some reason, this was done deliberately only for blocking filters and not for element hiding filters. There may have been a good reason for it, but it doesn't make sense to me as things are. I wish I could go back and look at "topic 9548" to get a better idea of why it was done this way.

Anyway, as for the implementation, it seems like it would just be a matter of flipping the ignoreTrailingDot flag.

comment:9 in reply to: ↑ 8 ; follow-up: Changed on 05/24/2018 at 10:30:32 AM by greiner

Replying to mjethani:

I wish I could go back and look at "topic 9548" to get a better idea of why it was done this way.

Here you go: https://adblockplus.org/forum/viewtopic.php?f=1&t=9548

comment:10 in reply to: ↑ 9 Changed on 05/25/2018 at 06:55:21 AM by mjethani

Replying to greiner:

Replying to mjethani:

I wish I could go back and look at "topic 9548" to get a better idea of why it was done this way.

Here you go: https://adblockplus.org/forum/viewtopic.php?f=1&t=9548

Oh, that's awesome!

Alas the discussion there doesn't explain the rationale for dropping the trailing dot only for blocking filters. Let's use our own judgement to make the call then. If no one disagrees, I would like to drop the trailing dot in hiding filters and let the filter apply on a domain even if the domain contains the trailing dot.

comment:11 Changed on 05/25/2018 at 06:55:56 AM by mjethani

  • Priority changed from Unknown to P3
  • Ready set

comment:12 Changed on 05/25/2018 at 12:19:22 PM by kzar

Sounds good to me, the only other option is to ask Wladimir but I'd rather not bother him.

comment:13 Changed on 05/25/2018 at 03:39:51 PM by sebastian

Treating domains in element hiding filters and in the $domain option of blocking filters consistently seems to be a good idea. However, I'm not sure about the benefit of stripping the trailing dot from domains given in filters. (Why would a filter list author add the trailing dot in the first place? Is there any filter list that actually does that?). Perhaps we should just leave the domain given in any filter alone.

But what is important is that specific filters for example.com are also applied if the document is loaded from example.com.. This currently isn't the case for element hiding filters, and we should fix that.

comment:14 Changed on 05/25/2018 at 10:54:56 PM by mjethani

You're right, I couldn't find any filters with trailing dots in EasyList+AA:

[...Filter.knownFilters.values()]
.filter(f => ([...f.domains && f.domains.keys() || []]).some(d => /\.$/.test(d)))

Since we've been simplifying the code anyway, perhaps it's a good idea to get rid of the dropping of trailing dots as well. This means filters that use trailing dots will no longer work as expected, but that appears not to be an issue in practice (similar to #6647).

comment:15 in reply to: ↑ 5 ; follow-up: Changed on 05/25/2018 at 10:59:26 PM by mjethani

Replying to mjethani:

So example.com.###foo should apply only to example.com. (which would ensure it never applies to a relative domain, e.g. a local website on your office network that happens to come from example.com.localdomain), whereas example.com###foo should apply to both example.com. (absolute) and example.com (relative).

This would be ideal.

I realize that based on Sebastian's suggestion above, we would end up with this "ideal" behavior (in my opinion) for free. Nice!

comment:16 Changed on 05/25/2018 at 11:44:40 PM by mjethani

  • Owner set to mjethani

comment:17 in reply to: ↑ 15 Changed on 05/25/2018 at 11:50:43 PM by mjethani

Replying to mjethani:

Replying to mjethani:

So example.com.###foo should apply only to example.com. (which would ensure it never applies to a relative domain, e.g. a local website on your office network that happens to come from example.com.localdomain), whereas example.com###foo should apply to both example.com. (absolute) and example.com (relative).

This would be ideal.

I realize that based on Sebastian's suggestion above, we would end up with this "ideal" behavior (in my opinion) for free. Nice!

Sorry again, this is not true if we want to keep it efficient. A domain with a trailing dot in the filter will basically be ignored for the purpose of matching, as if it didn't exist in the filter. This is OK as we all agree we don't care about domains with trailing dots in filters.

comment:18 Changed on 05/26/2018 at 12:11:12 AM by mjethani

  • Review URL(s) modified (diff)

comment:19 Changed on 06/05/2018 at 04:49:00 AM by abpbot

A commit referencing this issue has landed:
Issue 6690 - Always ignore trailing dot in document domain

comment:20 Changed on 06/05/2018 at 04:51:05 AM by mjethani

  • Resolution set to fixed
  • Status changed from new to closed

comment:21 Changed on 06/05/2018 at 05:14:08 AM by mjethani

  • Description modified (diff)

comment:22 Changed on 08/21/2018 at 11:40:21 AM by Ross

  • Tester changed from Unknown to Ross
  • Verified working set

Working as described.

ABP 3.2.0.2103
Chrome 68 / 55 / 49 / Windows 10
Firefox 61 / 55 / 51 / Windows 10

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from mjethani.
 
Note: See TracTickets for help on using tickets.