Opened 10 months ago

Closed 10 months ago

Last modified 9 months ago

#7245 closed change (fixed)

Move protocol keyword to end of candidate list

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: hfiguiere Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29993555/

Description (last modified by mjethani)

Background

In the _matchesAnyInternal function in lib/matcher.js, we extract all the keywords in the URL and test all the filters assigned to those keywords. The first keyword is typically https or http. There are a few hundred filters in EasyList that get indexed by this keyword, yet the hit rate for this keyword is extremely low.

If we move the first keyword to the end of the list of candidates, we can significantly reduce the amount of processing for blocked URLs.

In my benchmark with Alexa Top 50, I was able to reduce the overall processing for filter matching by ~15% by using this simple trick.

What to change

In the _matchesAnyInternal function in lib/matcher.js, move the first keyword in the URL to the end of the candidate list.

Hints for testers

Make sure that a filter containing the word https or http gets hit if it is the only keyword matching the URL. For example, the filter |https://example.com/ should block any request to https://example.com/ if there is no other filter that blocks the request.

Change History (12)

comment:1 Changed 10 months ago by mjethani

  • Blocking 7000 added
  • Owner set to mjethani

comment:2 Changed 10 months ago by mjethani

  • Review URL(s) modified (diff)

comment:3 Changed 10 months ago by mjethani

  • Status changed from new to reviewing

comment:4 follow-up: Changed 10 months ago by hfiguiere

Out of curiosity wouldn't www also be an early keyword we should move at the end?

comment:5 Changed 10 months ago by abpbot

A commit referencing this issue has landed:
Issue 7245 - Move protocol keyword to end of candidate list

comment:6 in reply to: ↑ 4 ; follow-up: Changed 10 months ago by mjethani

Replying to hfiguiere:

Out of curiosity wouldn't www also be an early keyword we should move at the end?

That's a good point.

The way I went for the first keyword in the URL is not by looking for https but by measuring the hit rate (number of hits as a percentage of number of tests) for the given keyword position. Index 0 had an exceptionally low hit rate, it was clearly an outlier and made no sense until I saw that it was mostly https. Index 1 had a relatively good hit rate, comparable with index 2.

comment:7 Changed 10 months ago by mjethani

  • Description modified (diff)

comment:8 Changed 10 months ago by mjethani

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:9 in reply to: ↑ 6 Changed 10 months ago by mjethani

Replying to mjethani:

Replying to hfiguiere:

Out of curiosity wouldn't www also be an early keyword we should move at the end?

That's a good point.

By the way, I checked, and there are not that many filters assigned to the keyword www. On the other hand, I found quite a few for com. We may want to analyze this further. Maybe in the future we want to move some well-known keywords to the end of the list. Maybe we also want to have this set of well-known keywords generated dynamically as the filters are loaded.

Last edited 10 months ago by mjethani (previous) (diff)

comment:10 Changed 10 months ago by mjethani

  • Cc hfiguiere added

comment:11 Changed 10 months ago by abpbot

A commit referencing this issue has landed:
Issue 7245 - Move protocol keyword to end of candidate list

comment:12 Changed 9 months ago by rscott

  • Verified working set
Note: See TracTickets for help on using tickets.