Opened on 05/07/2018 at 05:53:06 PM

Closed on 03/16/2019 at 03:53:10 PM

#6652 closed change (fixed)

Implement fast selector lookups for most domains

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords: circumvention
Cc: sergz, kzar, arthur, jsonesen, erikvold Blocked By: #6669
Blocking: #6667, #6955 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29774573/
https://codereview.adblockplus.org/29773570/

Description (last modified by mjethani)

Background

We can make some further optimizations to ElemHide.getSelectorsForDomain that would speed up lookups by ~20x for most domains.

  1. ElemHide.getSelectorsForDomain first creates a copy of the unconditionalSelectors array and then pushes each conditional selector to it. This is too slow, because the newly created copy is modified. Instead, it can populate a separate array with conditional selectors and then concatenate the two and return the result. This makes the code perform twice as fast on Chrome and about the same on Firefox.
  1. ElemHide.getSelectorsForDomain does filter matching for every domain. This is not necessary; instead, it can do filter matching only for known domains (i.e. those that have filters that would apply to them). For any random domain like x4aabd7c0ddcd.com or manishjethani.com the lookups can be much, much faster, because there are no domain-specific filters that apply and no exceptions to check against (except for generic exceptions like #@#foo, which would still apply to ##foo on x4aabd7c0ddcd.com or manishjethani.com).
  1. The same optimization also works for known domains that have no domain-specific exceptions for any generic filters. In EasyList+AA, out of 10,899 domains, 9,419 fall in this category (as of 12 May 2018).

In other words, ElemHide.getSelectorsForDomain can do faster lookups for all but 1,480 known domains.

  1. Since the list of generic selectors that apply on most domains is the same, ElemHide.getSelectorsForDomain can cache this list and, for most domains, return the cached list concatenated with the generated list of domain-specific selectors.

What to change

  1. In lib/elemHide.js in the ElemHide.getSelectorsForDomain function, populate a separate array with conditional selectors, then concatenate it with the unconditionalSelectors array and return the result.

Make ElemHide.getUnconditionalSelectors private to the module so it doesn't have to return a copy of unconditionalSelectors, as ElemHide.getSelectorsForDomain will make a copy anyway.

  1. In the same function, find out if a domain is "known".

If there are two filters like ##foo and example.com#@#foo, then example.com, mail.example.com, and any other subdomains of example.com should be considered as known.

In order to make this work for exceptions as well (as in the preceding example), exception domains should be added to the filtersByDomain map as well, but only the domains, without any filters.

Furthermore, if an exception has no domains (generic exception), it should be added to a new genericExceptions map.

Finally, in ElemHide.getSelectorsForDomain, if a domain is not known, straightaway look up generic filters in the filtersByDomain map, and for each filter check if the selector exists in the genericExceptions map before adding the selector. Cache the result of this loop (~1,100 selectors in EasyList+AA, as of 12 May 2018) as a new conditionalGenericSelectors array and use the cached result on subsequent calls.

  1. In the same function, if a domain is known but is found not to have any domain-specific exceptions for any generic filters, add the domain to a new genericFriendlyDomains set.

If a domain is found to be in the genericFriendlyDomains set (~9,500 out of ~11,000 domains in EasyList+AA, as of 12 May 2018), use the value of the conditionalGenericSelectors array instead of generating the list of generic selectors for the domain.

  1. In general, write the code in a way that some of the functionality and related optimizations can be reused by other modules like ElemHideEmulation (see #6665) and Snippets (see #6538).

Resolution

The ideas here got implemented as part of #6957, #6989, and #7244.

Attachments (0)

Change History (23)

comment:1 Changed on 05/09/2018 at 03:37:30 PM by mjethani

  • Description modified (diff)
  • Review URL(s) modified (diff)
  • Summary changed from Implement fast selector lookups for unknown domains to Make further optimizations to ElemHide.getSelectorsForDomain

comment:2 Changed on 05/09/2018 at 05:03:06 PM by mjethani

  • Description modified (diff)

comment:3 Changed on 05/09/2018 at 05:06:26 PM by mjethani

I feel like we should make this a meta issue and file separate issues for the specific changes. I also have more changes in mind that I haven't mentioned here, like efficient caching of the selectors for a known domain.

comment:4 Changed on 05/10/2018 at 08:08:08 AM by kzar

  • Blocking 6428 added

comment:5 Changed on 05/10/2018 at 01:50:36 PM by kzar

Well I think a meta issue is kinda overkill here and we've already had some other issues about improving the performance of this code. I don't think we need to cache selectors, since this code is already much faster than it was and will be faster still with your upcoming changes.

comment:6 Changed on 05/11/2018 at 04:11:38 AM by mjethani

  • Description modified (diff)

comment:7 Changed on 05/11/2018 at 06:34:43 AM by mjethani

Regarding caching, we could obviously cache the result of getConditionalGenericSelectors (similar to unconditionalSelectors now) since it's always the same every time it's called. It's also only about ~1,100 selectors with EasyList+AA.

If we cache that we can also use the same cached value for known domains that have no domain-specific exceptions for generic selectors, which should be most known domains. For such domains then ElemHide.getSelectorsForDomain will have the same performance as criteria = ElemHide.SPECIFIC_ONLY.

comment:8 Changed on 05/11/2018 at 01:08:27 PM by abpbot

A commit referencing this issue has landed:
Issue 6652 - Do not push to unconditional selectors array

comment:9 Changed on 05/11/2018 at 01:58:17 PM by kzar

  • Priority changed from Unknown to P2
  • Ready set

Fair enough, sorry I thought you were talking about caching the results of getSelectorForDomain which is something we decided against in the past.

comment:10 Changed on 05/11/2018 at 05:00:24 PM by mjethani

  • Blocking 6665 added

comment:11 Changed on 05/12/2018 at 05:59:18 PM by mjethani

  • Keywords circumvention added

Adding the circumvention tag for management and tracking purposes.

I just realized that the speedups here would be beneficial in fighting circumvention, especially for snippets filters where we need to inject the JavaScript before the HTML document has a chance to execute any of its own code (especially in the case of API wrappers), since we're using tabs.executeScript on the webNavigation.onCommitted event. I intend to call into ElemHide for looking up snippets filters eventually.

comment:12 Changed on 05/13/2018 at 01:30:11 PM by mjethani

  • Description modified (diff)
  • Summary changed from Make further optimizations to ElemHide.getSelectorsForDomain to Implement fast selector lookups for most domains

comment:13 Changed on 05/13/2018 at 01:46:36 PM by mjethani

  • Blocking 6667 added

comment:14 Changed on 05/14/2018 at 08:43:00 AM by arthur

  • Cc arthur added

comment:15 Changed on 05/15/2018 at 11:41:02 AM by kzar

  • Blocked By 6669 added

comment:16 Changed on 05/17/2018 at 05:53:29 AM by mjethani

  • Blocking 6665 removed

comment:17 Changed on 05/24/2018 at 02:14:58 AM by jsonesen

  • Cc jsonesen added

comment:18 Changed on 06/04/2018 at 08:29:10 PM by jsonesen

@mjethani do you mind setting this to reviewing? makes it easier when querying for new issues to work on.

comment:19 Changed on 06/05/2018 at 04:41:13 AM by mjethani

  • Status changed from new to reviewing

comment:20 Changed on 09/16/2018 at 05:50:00 PM by mjethani

  • Blocking 6955 added

comment:21 Changed on 11/15/2018 at 04:27:22 AM by erikvold

  • Cc erikvold added

comment:22 Changed on 11/19/2018 at 02:00:46 PM by mjethani

  • Blocking 6428 removed

comment:23 Changed on 03/16/2019 at 03:53:10 PM by mjethani

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from reviewing to closed

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from mjethani.
 
Note: See TracTickets for help on using tickets.