Opened on 01/30/2019 at 05:14:01 AM

Closed on 01/31/2019 at 04:24:01 AM

Last modified on 03/12/2019 at 05:43:54 AM

#7245 closed change (fixed)

Move protocol keyword to end of candidate list

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: hfiguiere Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29993555/

Description (last modified by mjethani)

Background

In the _matchesAnyInternal function in lib/matcher.js, we extract all the keywords in the URL and test all the filters assigned to those keywords. The first keyword is typically https or http. There are a few hundred filters in EasyList that get indexed by this keyword, yet the hit rate for this keyword is extremely low.

If we move the first keyword to the end of the list of candidates, we can significantly reduce the amount of processing for blocked URLs.

In my benchmark with Alexa Top 50, I was able to reduce the overall processing for filter matching by ~15% by using this simple trick.

What to change

In the _matchesAnyInternal function in lib/matcher.js, move the first keyword in the URL to the end of the candidate list.

Hints for testers

Make sure that a filter containing the word https or http gets hit if it is the only keyword matching the URL. For example, the filter |https://example.com/ should block any request to https://example.com/ if there is no other filter that blocks the request.

Attachments (0)

Change History (12)

comment:1 Changed on 01/30/2019 at 05:14:14 AM by mjethani

  • Blocking 7000 added
  • Owner set to mjethani

comment:2 Changed on 01/30/2019 at 05:15:35 AM by mjethani

  • Review URL(s) modified (diff)

comment:3 Changed on 01/30/2019 at 05:26:48 AM by mjethani

  • Status changed from new to reviewing

comment:4 follow-up: Changed on 01/30/2019 at 05:14:11 PM by hfiguiere

Out of curiosity wouldn't www also be an early keyword we should move at the end?

comment:5 Changed on 01/30/2019 at 05:22:40 PM by abpbot

A commit referencing this issue has landed:
Issue 7245 - Move protocol keyword to end of candidate list

comment:6 in reply to: ↑ 4 ; follow-up: Changed on 01/31/2019 at 04:19:46 AM by mjethani

Replying to hfiguiere:

Out of curiosity wouldn't www also be an early keyword we should move at the end?

That's a good point.

The way I went for the first keyword in the URL is not by looking for https but by measuring the hit rate (number of hits as a percentage of number of tests) for the given keyword position. Index 0 had an exceptionally low hit rate, it was clearly an outlier and made no sense until I saw that it was mostly https. Index 1 had a relatively good hit rate, comparable with index 2.

comment:7 Changed on 01/31/2019 at 04:23:50 AM by mjethani

  • Description modified (diff)

comment:8 Changed on 01/31/2019 at 04:24:01 AM by mjethani

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:9 in reply to: ↑ 6 Changed on 02/05/2019 at 11:49:03 AM by mjethani

Replying to mjethani:

Replying to hfiguiere:

Out of curiosity wouldn't www also be an early keyword we should move at the end?

That's a good point.

By the way, I checked, and there are not that many filters assigned to the keyword www. On the other hand, I found quite a few for com. We may want to analyze this further. Maybe in the future we want to move some well-known keywords to the end of the list. Maybe we also want to have this set of well-known keywords generated dynamically as the filters are loaded.

Last edited on 02/05/2019 at 11:49:47 AM by mjethani

comment:10 Changed on 02/05/2019 at 11:50:23 AM by mjethani

  • Cc hfiguiere added

comment:11 Changed on 02/07/2019 at 03:24:09 AM by abpbot

A commit referencing this issue has landed:
Issue 7245 - Move protocol keyword to end of candidate list

comment:12 Changed on 03/12/2019 at 05:43:54 AM by rscott

  • Verified working set

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from mjethani.
 
Note: See TracTickets for help on using tickets.