Opened 14 months ago

Closed 13 months ago

Last modified 9 months ago

#6994 closed change (fixed)

Use shortcut matching for filters with no content type, no domain/sitekey, and no third-party flag

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: jsonesen Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Ross Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29907586/
https://codereview.adblockplus.org/29926557/

Description (last modified by mjethani)

Background

About half of all blocking/whitelist filters have (1) no content type, (2) no domain/sitekey, and (3) no third-party flag. As long as the request's location matches the pattern in the filter, it is a match. For these filters, there should be a "shortcut" version of the RegExpFilter.prototype.matches method that checks only the location. When such filters are added to the matcher, it should keep them separately so they can be processed separately (with the shortcut version) for each call to the _checkEntryMatch method.

What to change

Add the following method to RegExpFilter:

  function isLocationOnly()
  {
    return this.contentType == RegExpFilter.prototype.contentType &&
           this.thirdParty == null &&
           !this.domains && !this.sitekeys;
  }

If the method returns true for a RegExpFilter object, it means the filter has only a location (URL pattern) and none of the other attributes used for matching.

Then in the add and remove methods of Matcher, keep location-only filters in a separate map. In the matchesAny method (specifically in checkEntryMatch), check location-only filters first, and do this using the new matchesLocation method rather than the matches method. If typeMask & RegExpFilter.prototype.contentType is zero, skip the location-only filters (since by definition they don't match the type mask).

Since location-only filters would now be maintained in a separate map, this would also also require changes in Matcher's findKeyword method as well as in some tests.

Hints for testers

This is an internal optimization in filter matching and the unit tests already cover all the cases. Nevertheless, it might make sense to test it again via the extension.

Make sure that blocking and whitelist filters are getting applied once added and no longer getting applied once removed. For example, add the filter /foo^ and see that it blocks https://example.com/foo. Add the filter @@||example.com^ and make sure that the filter /foo^ no longer blocks https://example.com/foo. Try changing the whitelist filter to @@||example.com^$~third-party (first-party only), load a page from localhost that makes a request to https://example.com/foo (third-party request), and see that it is now blocked again by the filter /foo^. Then change the blocking filter /foo^ to /foo^$image and load the URL https://example.com/foo as a script (use <script src="https://example.com/foo"></script>) and see that the URL is no longer blocked. Load the same URL as an image (use <img src="https://example.com/foo">) and see that it is blocked now. Remove both the filters and see that they are no longer being applied. Also look out for any errors in the background page console.

Change History (25)

comment:1 Changed 14 months ago by mjethani

  • Description modified (diff)

comment:2 Changed 14 months ago by jsonesen

This sounds like a great idea and something that I would like to work on since it will give me a good chance to get into the matcher code a bit more, which, I recently switched to classes.

I know it's an optimization and quite a hot spot so what do you think @mjethani?

comment:3 Changed 14 months ago by mjethani

@jsonesen yes, sounds like a good idea to me.

comment:4 Changed 14 months ago by mjethani

@ jsonesen we should be able to make this change without breaking compatibility (i.e. all unit tests should pass as they are), which makes it difficult but also a little less worrisome than #7003, since this is really about the internal logic of the module.

So I would go ahead and tackle this.

So as I said in the description, "About half of all blocking/whitelist filters have (1) no content type, (2) no domain/sitekey, and (3) no third-party flag", and we already know this at the time when the filter is added (filters are immutable). Let's call these "super fast" filters for the purpose of this discussion. For super fast filters, we can just directly call location.test(filter.regexp) instead of filter.matches(location, ...), which would skip three of the other checks. We just need to find an efficient way to maintain these filters separately. These filters would also have to be index by keyword.

comment:5 Changed 14 months ago by mjethani

  • Cc jsonesen added

comment:6 Changed 14 months ago by jsonesen

Awesome, I will go ahead and start on this.

comment:7 Changed 14 months ago by jsonesen

  • Owner set to jsonesen

comment:8 Changed 13 months ago by jsonesen

  • Description modified (diff)

comment:9 follow-up: Changed 13 months ago by jsonesen

What do you think about the what to change? I figured, at least, that a new map by keyword for location only filters wouldn't be much worse than what we currently have and the speedup from using location.test would be worth it for what extra memory is consumed. Also, I wonder if it makes sense to even store filters that get skipped entirely?

Last edited 13 months ago by jsonesen (previous) (diff)

comment:10 in reply to: ↑ description Changed 13 months ago by mjethani

Replying to mjethani:

[...] and a new locationFilterByKeyword to the constructor.

I don't get this part in the "What to change" section, do you mean the RegExp filter constructor?

comment:11 in reply to: ↑ 9 Changed 13 months ago by mjethani

Replying to jsonesen:

What do you think about the what to change? I figured, at least, that a new map by keyword for location only filters wouldn't be much worse than what we currently have and the speedup from using location.test would be worth it for what extra memory is consumed. Also, I wonder if it makes sense to even store filters that get skipped entirely?

A separate second map sounds like a good idea, and I don't think it should take more memory because we can add those filters only to that map (we would have to do this, otherwise we would be iterating over and testing so many filters twice).

So we can keep filterByKeyword but add an additional fastFilterByKeyword. In the add function, if filter.isLocationOnly() is true, we add it to fastFilterByKeyword, otherwise to filterByKeyword.

comment:12 Changed 13 months ago by jsonesen

  • Description modified (diff)

comment:13 follow-up: Changed 13 months ago by jsonesen

Are we avoiding just checking isLocationOnly in the _checkEntryMatch iteration for speed reasons? Adding a new map where we keep fast filters but remove them from filterByKeyword will require quite a lot of changes to make work

comment:14 Changed 13 months ago by jsonesen

  • Description modified (diff)
  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:15 Changed 13 months ago by mjethani

  • Blocking 7000 added

comment:16 in reply to: ↑ 13 Changed 13 months ago by mjethani

Replying to jsonesen:

Are we avoiding just checking isLocationOnly in the _checkEntryMatch iteration for speed reasons?

It wouldn't be an optimization if we checked isLocationOnly on each filter while matching. (How would this be any different than just calling matches?) It's an optimization only if we avoid checking this.

Adding a new map where we keep fast filters but remove them from filterByKeyword will require quite a lot of changes to make work

Yes :)

comment:17 in reply to: ↑ description ; follow-up: Changed 13 months ago by mjethani

Replying to mjethani:

if typeMask == 0 && RegExpFilter.prototype.contentType == 0, these filters should be skipped entirely

This is totally not what I meant. Could you please correct this?

comment:18 in reply to: ↑ 17 Changed 13 months ago by jsonesen

  • Description modified (diff)

Replying to mjethani:

Replying to mjethani:

if typeMask == 0 && RegExpFilter.prototype.contentType == 0, these filters should be skipped entirely

This is totally not what I meant. Could you please correct this?

For sure

comment:19 Changed 13 months ago by abpbot

A commit referencing this issue has landed:
Issue 6994 - Use shortcut matching for location-only filters

comment:20 Changed 13 months ago by mjethani

  • Review URL(s) modified (diff)

comment:21 Changed 13 months ago by mjethani

  • Description modified (diff)

comment:22 Changed 13 months ago by mjethani

  • Owner changed from jsonesen to mjethani

comment:23 Changed 13 months ago by mjethani

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:24 Changed 10 months ago by abpbot

A commit referencing this issue has landed:
Issue 6994 - Use shortcut matching for location-only filters

comment:25 Changed 9 months ago by Ross

  • Tester changed from Unknown to Ross
  • Verified working set

Done. Looks to be working as described with no errors in the background console.

ABP 3.4.3.2253
Firefox 65.0.1 / 51 / Windows 10
Firefox Mobile 65.0.1 / Android 7.1.1
Chrome 72.0.3626.109 / 49.0.2623.75 / Windows 10
Opera 58.0.3135.65 / 36.0.2130.80 / Windows 10
Edge 44.17763.1.0 / Windows 10

Note: See TracTickets for help on using tickets.