Opened 2 years ago

Last modified 19 months ago

#6994 closed change

Use shortcut matching for filters with no content type, no domain/sitekey, and no third-party flag — at Version 18

Reported by: mjethani Assignee: jsonesen
Priority: P2 Milestone:
Module: Core Keywords:
Cc: jsonesen Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Ross Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29907586/
https://codereview.adblockplus.org/29926557/

Description (last modified by jsonesen)

Background

About half of all blocking/whitelist filters have (1) no content type, (2) no domain/sitekey, and (3) no third-party flag. As long as the request's location matches the pattern in the filter, it is a match. For these filters, there should be a "shortcut" version of the RegExpFilter.prototype.matches method that checks only the location. When such filters are added to the matcher, it should keep them separately so they can be processed separately (with the shortcut version) for each call to the _checkEntryMatch method.

What to change

Add a new map for location only filters (fastFilterByKeyword) to the matcher class in lib/matcher.js

Change _checkEntryMatch in lib/matcher.js to initially loop through location only filters checking for matches using filter.regexp.test(location) or if a new location only version of RegExpFilter.prototype.matches is used it can call filter.matchesLocation. Additionally, in _checkEntryMatch, if
typeMask & RegExpFilter.prototype.contentType == 0, these filters should be skipped entirely.

Add a isLocationOnly() property to the RegExpFilter class. Also, filters must not be stored twice so when adding to the fastFilterByKeyword map we should skip adding to the matcher.filterByKeyword map.

Using a new map in the matcher class will require the following changes:

  • Matcher.findKeyword will have to be updated to handle fast-filters (location only)
  • Matcher.add() and Matcher.remove(filter) will have to determine which map to use based on the new isLocationOnly method from RegExpFilter

Notes

While the goal is to not change anything about the tests, changing the findkeyword method as well as having two maps will likely require changes to keyword tests and anywhere filterByKeyword is used.

Change History (18)

comment:1 Changed 2 years ago by mjethani

  • Description modified (diff)

comment:2 Changed 2 years ago by jsonesen

This sounds like a great idea and something that I would like to work on since it will give me a good chance to get into the matcher code a bit more, which, I recently switched to classes.

I know it's an optimization and quite a hot spot so what do you think @mjethani?

comment:3 Changed 2 years ago by mjethani

@jsonesen yes, sounds like a good idea to me.

comment:4 Changed 2 years ago by mjethani

@ jsonesen we should be able to make this change without breaking compatibility (i.e. all unit tests should pass as they are), which makes it difficult but also a little less worrisome than #7003, since this is really about the internal logic of the module.

So I would go ahead and tackle this.

So as I said in the description, "About half of all blocking/whitelist filters have (1) no content type, (2) no domain/sitekey, and (3) no third-party flag", and we already know this at the time when the filter is added (filters are immutable). Let's call these "super fast" filters for the purpose of this discussion. For super fast filters, we can just directly call location.test(filter.regexp) instead of filter.matches(location, ...), which would skip three of the other checks. We just need to find an efficient way to maintain these filters separately. These filters would also have to be index by keyword.

comment:5 Changed 2 years ago by mjethani

  • Cc jsonesen added

comment:6 Changed 2 years ago by jsonesen

Awesome, I will go ahead and start on this.

comment:7 Changed 2 years ago by jsonesen

  • Owner set to jsonesen

comment:8 Changed 2 years ago by jsonesen

  • Description modified (diff)

comment:9 follow-up: Changed 2 years ago by jsonesen

What do you think about the what to change? I figured, at least, that a new map by keyword for location only filters wouldn't be much worse than what we currently have and the speedup from using location.test would be worth it for what extra memory is consumed. Also, I wonder if it makes sense to even store filters that get skipped entirely?

Last edited 2 years ago by jsonesen (previous) (diff)

comment:10 in reply to: ↑ description Changed 2 years ago by mjethani

Replying to mjethani:

[...] and a new locationFilterByKeyword to the constructor.

I don't get this part in the "What to change" section, do you mean the RegExp filter constructor?

comment:11 in reply to: ↑ 9 Changed 2 years ago by mjethani

Replying to jsonesen:

What do you think about the what to change? I figured, at least, that a new map by keyword for location only filters wouldn't be much worse than what we currently have and the speedup from using location.test would be worth it for what extra memory is consumed. Also, I wonder if it makes sense to even store filters that get skipped entirely?

A separate second map sounds like a good idea, and I don't think it should take more memory because we can add those filters only to that map (we would have to do this, otherwise we would be iterating over and testing so many filters twice).

So we can keep filterByKeyword but add an additional fastFilterByKeyword. In the add function, if filter.isLocationOnly() is true, we add it to fastFilterByKeyword, otherwise to filterByKeyword.

comment:12 Changed 2 years ago by jsonesen

  • Description modified (diff)

comment:13 follow-up: Changed 2 years ago by jsonesen

Are we avoiding just checking isLocationOnly in the _checkEntryMatch iteration for speed reasons? Adding a new map where we keep fast filters but remove them from filterByKeyword will require quite a lot of changes to make work

comment:14 Changed 2 years ago by jsonesen

  • Description modified (diff)
  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:15 Changed 2 years ago by mjethani

  • Blocking 7000 added

comment:16 in reply to: ↑ 13 Changed 2 years ago by mjethani

Replying to jsonesen:

Are we avoiding just checking isLocationOnly in the _checkEntryMatch iteration for speed reasons?

It wouldn't be an optimization if we checked isLocationOnly on each filter while matching. (How would this be any different than just calling matches?) It's an optimization only if we avoid checking this.

Adding a new map where we keep fast filters but remove them from filterByKeyword will require quite a lot of changes to make work

Yes :)

comment:17 in reply to: ↑ description ; follow-up: Changed 2 years ago by mjethani

Replying to mjethani:

if typeMask == 0 && RegExpFilter.prototype.contentType == 0, these filters should be skipped entirely

This is totally not what I meant. Could you please correct this?

comment:18 in reply to: ↑ 17 Changed 2 years ago by jsonesen

  • Description modified (diff)

Replying to mjethani:

Replying to mjethani:

if typeMask == 0 && RegExpFilter.prototype.contentType == 0, these filters should be skipped entirely

This is totally not what I meant. Could you please correct this?

For sure

Note: See TracTickets for help on using tickets.