Opened on 09/27/2018 at 09:24:49 AM

Closed on 10/30/2018 at 10:33:58 PM

Last modified on 02/19/2019 at 01:02:37 PM

#6994 closed change (fixed)

Use shortcut matching for filters with no content type, no domain/sitekey, and no third-party flag

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: jsonesen Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Ross Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29907586/
https://codereview.adblockplus.org/29926557/

Description (last modified by mjethani)

Background

About half of all blocking/whitelist filters have (1) no content type, (2) no domain/sitekey, and (3) no third-party flag. As long as the request's location matches the pattern in the filter, it is a match. For these filters, there should be a "shortcut" version of the RegExpFilter.prototype.matches method that checks only the location. When such filters are added to the matcher, it should keep them separately so they can be processed separately (with the shortcut version) for each call to the _checkEntryMatch method.

What to change

Add the following method to RegExpFilter:

  function isLocationOnly()
  {
    return this.contentType == RegExpFilter.prototype.contentType &&
           this.thirdParty == null &&
           !this.domains && !this.sitekeys;
  }

If the method returns true for a RegExpFilter object, it means the filter has only a location (URL pattern) and none of the other attributes used for matching.

Then in the add and remove methods of Matcher, keep location-only filters in a separate map. In the matchesAny method (specifically in checkEntryMatch), check location-only filters first, and do this using the new matchesLocation method rather than the matches method. If typeMask & RegExpFilter.prototype.contentType is zero, skip the location-only filters (since by definition they don't match the type mask).

Since location-only filters would now be maintained in a separate map, this would also also require changes in Matcher's findKeyword method as well as in some tests.

Hints for testers

This is an internal optimization in filter matching and the unit tests already cover all the cases. Nevertheless, it might make sense to test it again via the extension.

Make sure that blocking and whitelist filters are getting applied once added and no longer getting applied once removed. For example, add the filter /foo^ and see that it blocks https://example.com/foo. Add the filter @@||example.com^ and make sure that the filter /foo^ no longer blocks https://example.com/foo. Try changing the whitelist filter to @@||example.com^$~third-party (first-party only), load a page from localhost that makes a request to https://example.com/foo (third-party request), and see that it is now blocked again by the filter /foo^. Then change the blocking filter /foo^ to /foo^$image and load the URL https://example.com/foo as a script (use <script src="https://example.com/foo"></script>) and see that the URL is no longer blocked. Load the same URL as an image (use <img src="https://example.com/foo">) and see that it is blocked now. Remove both the filters and see that they are no longer being applied. Also look out for any errors in the background page console.

Attachments (0)

Change History (25)

comment:1 Changed on 09/27/2018 at 09:25:08 AM by mjethani

  • Description modified (diff)

comment:2 Changed on 10/01/2018 at 03:10:57 AM by jsonesen

This sounds like a great idea and something that I would like to work on since it will give me a good chance to get into the matcher code a bit more, which, I recently switched to classes.

I know it's an optimization and quite a hot spot so what do you think @mjethani?

comment:3 Changed on 10/02/2018 at 04:51:19 PM by mjethani

@jsonesen yes, sounds like a good idea to me.

comment:4 Changed on 10/02/2018 at 05:38:29 PM by mjethani

@ jsonesen we should be able to make this change without breaking compatibility (i.e. all unit tests should pass as they are), which makes it difficult but also a little less worrisome than #7003, since this is really about the internal logic of the module.

So I would go ahead and tackle this.

So as I said in the description, "About half of all blocking/whitelist filters have (1) no content type, (2) no domain/sitekey, and (3) no third-party flag", and we already know this at the time when the filter is added (filters are immutable). Let's call these "super fast" filters for the purpose of this discussion. For super fast filters, we can just directly call location.test(filter.regexp) instead of filter.matches(location, ...), which would skip three of the other checks. We just need to find an efficient way to maintain these filters separately. These filters would also have to be index by keyword.

comment:5 Changed on 10/02/2018 at 06:34:50 PM by mjethani

  • Cc jsonesen added

comment:6 Changed on 10/04/2018 at 12:27:50 AM by jsonesen

Awesome, I will go ahead and start on this.

comment:7 Changed on 10/08/2018 at 06:11:31 PM by jsonesen

  • Owner set to jsonesen

comment:8 Changed on 10/18/2018 at 02:21:08 AM by jsonesen

  • Description modified (diff)

comment:9 follow-up: Changed on 10/18/2018 at 02:22:39 AM by jsonesen

What do you think about the what to change? I figured, at least, that a new map by keyword for location only filters wouldn't be much worse than what we currently have and the speedup from using location.test would be worth it for what extra memory is consumed. Also, I wonder if it makes sense to even store filters that get skipped entirely?

Last edited on 10/18/2018 at 02:25:07 AM by jsonesen

comment:10 in reply to: ↑ description Changed on 10/18/2018 at 02:45:20 PM by mjethani

Replying to mjethani:

[...] and a new locationFilterByKeyword to the constructor.

I don't get this part in the "What to change" section, do you mean the RegExp filter constructor?

comment:11 in reply to: ↑ 9 Changed on 10/18/2018 at 02:51:24 PM by mjethani

Replying to jsonesen:

What do you think about the what to change? I figured, at least, that a new map by keyword for location only filters wouldn't be much worse than what we currently have and the speedup from using location.test would be worth it for what extra memory is consumed. Also, I wonder if it makes sense to even store filters that get skipped entirely?

A separate second map sounds like a good idea, and I don't think it should take more memory because we can add those filters only to that map (we would have to do this, otherwise we would be iterating over and testing so many filters twice).

So we can keep filterByKeyword but add an additional fastFilterByKeyword. In the add function, if filter.isLocationOnly() is true, we add it to fastFilterByKeyword, otherwise to filterByKeyword.

comment:12 Changed on 10/20/2018 at 11:37:02 PM by jsonesen

  • Description modified (diff)

comment:13 follow-up: Changed on 10/21/2018 at 01:03:36 AM by jsonesen

Are we avoiding just checking isLocationOnly in the _checkEntryMatch iteration for speed reasons? Adding a new map where we keep fast filters but remove them from filterByKeyword will require quite a lot of changes to make work

comment:14 Changed on 10/22/2018 at 02:17:34 AM by jsonesen

  • Description modified (diff)
  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:15 Changed on 10/23/2018 at 11:40:12 AM by mjethani

  • Blocking 7000 added

comment:16 in reply to: ↑ 13 Changed on 10/23/2018 at 01:49:50 PM by mjethani

Replying to jsonesen:

Are we avoiding just checking isLocationOnly in the _checkEntryMatch iteration for speed reasons?

It wouldn't be an optimization if we checked isLocationOnly on each filter while matching. (How would this be any different than just calling matches?) It's an optimization only if we avoid checking this.

Adding a new map where we keep fast filters but remove them from filterByKeyword will require quite a lot of changes to make work

Yes :)

comment:17 in reply to: ↑ description ; follow-up: Changed on 10/23/2018 at 01:55:32 PM by mjethani

Replying to mjethani:

if typeMask == 0 && RegExpFilter.prototype.contentType == 0, these filters should be skipped entirely

This is totally not what I meant. Could you please correct this?

comment:18 in reply to: ↑ 17 Changed on 10/23/2018 at 04:44:36 PM by jsonesen

  • Description modified (diff)

Replying to mjethani:

Replying to mjethani:

if typeMask == 0 && RegExpFilter.prototype.contentType == 0, these filters should be skipped entirely

This is totally not what I meant. Could you please correct this?

For sure

comment:19 Changed on 10/30/2018 at 04:54:40 PM by abpbot

A commit referencing this issue has landed:
Issue 6994 - Use shortcut matching for location-only filters

comment:20 Changed on 10/30/2018 at 04:56:45 PM by mjethani

  • Review URL(s) modified (diff)

comment:21 Changed on 10/30/2018 at 10:25:25 PM by mjethani

  • Description modified (diff)

comment:22 Changed on 10/30/2018 at 10:33:50 PM by mjethani

  • Owner changed from jsonesen to mjethani

comment:23 Changed on 10/30/2018 at 10:33:58 PM by mjethani

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:24 Changed on 02/07/2019 at 03:23:43 AM by abpbot

A commit referencing this issue has landed:
Issue 6994 - Use shortcut matching for location-only filters

comment:25 Changed on 02/19/2019 at 01:02:37 PM by Ross

  • Tester changed from Unknown to Ross
  • Verified working set

Done. Looks to be working as described with no errors in the background console.

ABP 3.4.3.2253
Firefox 65.0.1 / 51 / Windows 10
Firefox Mobile 65.0.1 / Android 7.1.1
Chrome 72.0.3626.109 / 49.0.2623.75 / Windows 10
Opera 58.0.3135.65 / 36.0.2130.80 / Windows 10
Edge 44.17763.1.0 / Windows 10

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from mjethani.
 
Note: See TracTickets for help on using tickets.