Opened on 01/10/2019 at 11:46:24 PM
Closed on 01/16/2019 at 07:43:27 PM
Last modified on 07/29/2019 at 09:41:03 AM
#7209 closed change (fixed)
Treat patterns with trailing separators as literals
Reported by: | mjethani | Assignee: | mjethani |
---|---|---|---|
Priority: | P2 | Milestone: | |
Module: | Core | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | #7000 | Platform: | Unknown / Cross platform |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | yes |
Review URL(s): |
Description (last modified by mjethani)
Background
In #7052 we started doing string-based matching for literal patterns. About half of all blocking/whitelist filters were found to match the criteria. In the definition of "literal pattern", we also included patterns starting with a || because these are so common and could be handled similarly without much overhead.
It turns out that EasyList contains nearly ~20,000 filters with patterns that end in a ^, often in combination with a leading ||.
Here are some examples:
&pltype=adhost^ /abnl/?begun^ /showads^ /spopunder^$popup ||ad.doubleclick.net^$~object-subrequest,third-party
Such patterns too could be treated as literals, by looking for the string up to and not including the ^ in the URL and then doing an explicit check for the ^ (separator), which only needs to match the regular expression [\x00-\x24\x26-\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\x7F] or the end of the URL.
My experimental patch doubles the speed of RegExpFilter.prototype.matchesLocation while reducing the overall memory footprint by ~8 MB for EasyList alone.
What to change
Modify isLiteralPattern in lib/filterClasses.js to include patterns that end in a ^ but otherwise meet the criteria.
In RegExpFilter.prototype.matchesLocation, look for the pattern up to but not including the ^, and then match the following character against the regular expression [\x00-\x24\x26-\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\x7F] if it is not undefined.
Hints for testers
Patterns ending in ^ should work correctly. The filters ||example.com^, ||example.com/foo^, and ||example.com/foo/index.html^ should all block the URLs https://example.com/foo/index.html and https://example.com/foo/index.html?id=1.
The filter ||example.com/foo^$third-party should block the URLs https://example.com/foo/index.html and https://example.com/foo/index.html?id=1 if they are third-party requests (i.e. not originating from example.com).
The filter ||example.com/foo/^ (note: there's an additional slash) should not block the URL https://example.com/foo/index.html but should still block the URL https://example.com/foo/.
If a whitelist filter @@||example.com/foo^ is added in addition to any one of ||example.com^, ||example.com/foo^, and ||example.com/foo/index.html^, it should whitelist the URLs https://example.com/foo/index.html and https://example.com/foo/index.html?id=1.
Attachments (0)
Change History (10)
comment:1 Changed on 01/10/2019 at 11:50:43 PM by mjethani
comment:2 Changed on 01/10/2019 at 11:51:33 PM by mjethani
- Status changed from new to reviewing
comment:3 Changed on 01/10/2019 at 11:52:00 PM by mjethani
- Blocking 7000 added
comment:4 Changed on 01/15/2019 at 02:21:14 PM by abpbot
comment:6 Changed on 01/16/2019 at 07:43:27 PM by mjethani
- Resolution set to fixed
- Status changed from reviewing to closed
comment:8 Changed on 02/07/2019 at 03:24:03 AM by abpbot
A commit referencing this issue has landed:
Issue 7209 - Treat patterns with trailing separators as literals
comment:9 Changed on 03/12/2019 at 06:05:47 AM by rscott
- Verified working set
comment:10 Changed on 07/29/2019 at 09:41:03 AM by Ross
Done. Working as described.
Rechecked for 3.6 from adblockpluscore#17.
ABP 0.9.15.2339
Microsoft Edge 44.17763.1.0 / Windows 10 1809
ABP 3.5.2.2340
Chrome 49.0.2623.75 / Windows 10 1809
Chrome 75.0.3770.142 / Windows 10 1809
Opera 36.0.2130.65 / Windows 10 1809
Opera 62.0.3331.72 / Windows 10 1809
Firefox 51.0 / Windows 10 1809
Firefox 68.0 / Windows 10 1809
Firefox Mobile 68.0 / Android 7.2.2
A commit referencing this issue has landed:
Issue 7209 - Treat patterns with trailing separators as literals