Opened 9 months ago

Closed 9 months ago

Last modified 3 months ago

#7209 closed change (fixed)

Treat patterns with trailing separators as literals

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29978573/

Description (last modified by mjethani)

Background

In #7052 we started doing string-based matching for literal patterns. About half of all blocking/whitelist filters were found to match the criteria. In the definition of "literal pattern", we also included patterns starting with a || because these are so common and could be handled similarly without much overhead.

It turns out that EasyList contains nearly ~20,000 filters with patterns that end in a ^, often in combination with a leading ||.

Here are some examples:

&pltype=adhost^
/abnl/?begun^
/showads^
/spopunder^$popup
||ad.doubleclick.net^$~object-subrequest,third-party

Such patterns too could be treated as literals, by looking for the string up to and not including the ^ in the URL and then doing an explicit check for the ^ (separator), which only needs to match the regular expression [\x00-\x24\x26-\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\x7F] or the end of the URL.

My experimental patch doubles the speed of RegExpFilter.prototype.matchesLocation while reducing the overall memory footprint by ~8 MB for EasyList alone.

What to change

Modify isLiteralPattern in lib/filterClasses.js to include patterns that end in a ^ but otherwise meet the criteria.

In RegExpFilter.prototype.matchesLocation, look for the pattern up to but not including the ^, and then match the following character against the regular expression [\x00-\x24\x26-\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\x7F] if it is not undefined.

Hints for testers

Patterns ending in ^ should work correctly. The filters ||example.com^, ||example.com/foo^, and ||example.com/foo/index.html^ should all block the URLs https://example.com/foo/index.html and https://example.com/foo/index.html?id=1.

The filter ||example.com/foo^$third-party should block the URLs https://example.com/foo/index.html and https://example.com/foo/index.html?id=1 if they are third-party requests (i.e. not originating from example.com).

The filter ||example.com/foo/^ (note: there's an additional slash) should not block the URL https://example.com/foo/index.html but should still block the URL https://example.com/foo/.

If a whitelist filter @@||example.com/foo^ is added in addition to any one of ||example.com^, ||example.com/foo^, and ||example.com/foo/index.html^, it should whitelist the URLs https://example.com/foo/index.html and https://example.com/foo/index.html?id=1.

Change History (10)

comment:1 Changed 9 months ago by mjethani

  • Component changed from Unknown to Core
  • Description modified (diff)
  • Owner set to mjethani
  • Priority changed from Unknown to P2
  • Ready set
  • Review URL(s) modified (diff)
  • Summary changed from Treat patterns with trailing separator as literals to Treat patterns with trailing separators as literals

comment:2 Changed 9 months ago by mjethani

  • Status changed from new to reviewing

comment:3 Changed 9 months ago by mjethani

  • Blocking 7000 added

comment:4 Changed 9 months ago by abpbot

A commit referencing this issue has landed:
Issue 7209 - Treat patterns with trailing separators as literals

comment:5 Changed 9 months ago by mjethani

  • Description modified (diff)

comment:6 Changed 9 months ago by mjethani

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:7 Changed 9 months ago by mjethani

  • Description modified (diff)

comment:8 Changed 9 months ago by abpbot

A commit referencing this issue has landed:
Issue 7209 - Treat patterns with trailing separators as literals

comment:9 Changed 7 months ago by rscott

  • Verified working set

comment:10 Changed 3 months ago by Ross

Done. Working as described.

Rechecked for 3.6 from ​adblockpluscore#17.

ABP 0.9.15.2339
Microsoft Edge 44.17763.1.0 / Windows 10 1809

ABP 3.5.2.2340
Chrome 49.0.2623.75 / Windows 10 1809
Chrome 75.0.3770.142 / Windows 10 1809
Opera 36.0.2130.65 / Windows 10 1809
Opera 62.0.3331.72 / Windows 10 1809
Firefox 51.0 / Windows 10 1809
Firefox 68.0 / Windows 10 1809
Firefox Mobile 68.0 / Android 7.2.2

Note: See TracTickets for help on using tickets.