Opened on 01/30/2019 at 05:14:01 AM
Closed on 01/31/2019 at 04:24:01 AM
Last modified on 03/12/2019 at 05:43:54 AM
#7245 closed change (fixed)
Move protocol keyword to end of candidate list
Reported by: | mjethani | Assignee: | mjethani |
---|---|---|---|
Priority: | P2 | Milestone: | |
Module: | Core | Keywords: | |
Cc: | hfiguiere | Blocked By: | |
Blocking: | #7000 | Platform: | Unknown / Cross platform |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | yes |
Review URL(s): |
Description (last modified by mjethani)
Background
In the _matchesAnyInternal function in lib/matcher.js, we extract all the keywords in the URL and test all the filters assigned to those keywords. The first keyword is typically https or http. There are a few hundred filters in EasyList that get indexed by this keyword, yet the hit rate for this keyword is extremely low.
If we move the first keyword to the end of the list of candidates, we can significantly reduce the amount of processing for blocked URLs.
In my benchmark with Alexa Top 50, I was able to reduce the overall processing for filter matching by ~15% by using this simple trick.
What to change
In the _matchesAnyInternal function in lib/matcher.js, move the first keyword in the URL to the end of the candidate list.
Hints for testers
Make sure that a filter containing the word https or http gets hit if it is the only keyword matching the URL. For example, the filter |https://example.com/ should block any request to https://example.com/ if there is no other filter that blocks the request.
Attachments (0)
Change History (12)
comment:1 Changed on 01/30/2019 at 05:14:14 AM by mjethani
- Blocking 7000 added
- Owner set to mjethani
comment:3 Changed on 01/30/2019 at 05:26:48 AM by mjethani
- Status changed from new to reviewing
comment:5 Changed on 01/30/2019 at 05:22:40 PM by abpbot
A commit referencing this issue has landed:
Issue 7245 - Move protocol keyword to end of candidate list
comment:6 in reply to: ↑ 4 ; follow-up: ↓ 9 Changed on 01/31/2019 at 04:19:46 AM by mjethani
Replying to hfiguiere:
Out of curiosity wouldn't www also be an early keyword we should move at the end?
That's a good point.
The way I went for the first keyword in the URL is not by looking for https but by measuring the hit rate (number of hits as a percentage of number of tests) for the given keyword position. Index 0 had an exceptionally low hit rate, it was clearly an outlier and made no sense until I saw that it was mostly https. Index 1 had a relatively good hit rate, comparable with index 2.
comment:8 Changed on 01/31/2019 at 04:24:01 AM by mjethani
- Resolution set to fixed
- Status changed from reviewing to closed
comment:9 in reply to: ↑ 6 Changed on 02/05/2019 at 11:49:03 AM by mjethani
Replying to mjethani:
Replying to hfiguiere:
Out of curiosity wouldn't www also be an early keyword we should move at the end?
That's a good point.
By the way, I checked, and there are not that many filters assigned to the keyword www. On the other hand, I found quite a few for com. We may want to analyze this further. Maybe in the future we want to move some well-known keywords to the end of the list. Maybe we also want to have this set of well-known keywords generated dynamically as the filters are loaded.
comment:10 Changed on 02/05/2019 at 11:50:23 AM by mjethani
- Cc hfiguiere added
comment:11 Changed on 02/07/2019 at 03:24:09 AM by abpbot
A commit referencing this issue has landed:
Issue 7245 - Move protocol keyword to end of candidate list
comment:12 Changed on 03/12/2019 at 05:43:54 AM by rscott
- Verified working set
Out of curiosity wouldn't www also be an early keyword we should move at the end?