Opened 7 months ago

Last modified 3 weeks ago

#6652 reviewing change

Implement fast selector lookups for most domains

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords: circumvention
Cc: sergz, kzar, arthur, jsonesen, erikvold Blocked By: #6669
Blocking: #6667, #6955 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29774573/
https://codereview.adblockplus.org/29773570/

Description (last modified by mjethani)

Background

We can make some further optimizations to ElemHide.getSelectorsForDomain that would speed up lookups by ~20x for most domains.

  1. ElemHide.getSelectorsForDomain first creates a copy of the unconditionalSelectors array and then pushes each conditional selector to it. This is too slow, because the newly created copy is modified. Instead, it can populate a separate array with conditional selectors and then concatenate the two and return the result. This makes the code perform twice as fast on Chrome and about the same on Firefox.
  1. ElemHide.getSelectorsForDomain does filter matching for every domain. This is not necessary; instead, it can do filter matching only for known domains (i.e. those that have filters that would apply to them). For any random domain like x4aabd7c0ddcd.com or manishjethani.com the lookups can be much, much faster, because there are no domain-specific filters that apply and no exceptions to check against (except for generic exceptions like #@#foo, which would still apply to ##foo on x4aabd7c0ddcd.com or manishjethani.com).
  1. The same optimization also works for known domains that have no domain-specific exceptions for any generic filters. In EasyList+AA, out of 10,899 domains, 9,419 fall in this category (as of 12 May 2018).

In other words, ElemHide.getSelectorsForDomain can do faster lookups for all but 1,480 known domains.

  1. Since the list of generic selectors that apply on most domains is the same, ElemHide.getSelectorsForDomain can cache this list and, for most domains, return the cached list concatenated with the generated list of domain-specific selectors.

What to change

  1. In lib/elemHide.js in the ElemHide.getSelectorsForDomain function, populate a separate array with conditional selectors, then concatenate it with the unconditionalSelectors array and return the result.

Make ElemHide.getUnconditionalSelectors private to the module so it doesn't have to return a copy of unconditionalSelectors, as ElemHide.getSelectorsForDomain will make a copy anyway.

  1. In the same function, find out if a domain is "known".

If there are two filters like ##foo and example.com#@#foo, then example.com, mail.example.com, and any other subdomains of example.com should be considered as known.

In order to make this work for exceptions as well (as in the preceding example), exception domains should be added to the filtersByDomain map as well, but only the domains, without any filters.

Furthermore, if an exception has no domains (generic exception), it should be added to a new genericExceptions map.

Finally, in ElemHide.getSelectorsForDomain, if a domain is not known, straightaway look up generic filters in the filtersByDomain map, and for each filter check if the selector exists in the genericExceptions map before adding the selector. Cache the result of this loop (~1,100 selectors in EasyList+AA, as of 12 May 2018) as a new conditionalGenericSelectors array and use the cached result on subsequent calls.

  1. In the same function, if a domain is known but is found not to have any domain-specific exceptions for any generic filters, add the domain to a new genericFriendlyDomains set.

If a domain is found to be in the genericFriendlyDomains set (~9,500 out of ~11,000 domains in EasyList+AA, as of 12 May 2018), use the value of the conditionalGenericSelectors array instead of generating the list of generic selectors for the domain.

  1. In general, write the code in a way that some of the functionality and related optimizations can be reused by other modules like ElemHideEmulation (see #6665) and Snippets (see #6538).

Change History (22)

comment:1 Changed 7 months ago by mjethani

  • Description modified (diff)
  • Review URL(s) modified (diff)
  • Summary changed from Implement fast selector lookups for unknown domains to Make further optimizations to ElemHide.getSelectorsForDomain

comment:2 Changed 7 months ago by mjethani

  • Description modified (diff)

comment:3 Changed 7 months ago by mjethani

I feel like we should make this a meta issue and file separate issues for the specific changes. I also have more changes in mind that I haven't mentioned here, like efficient caching of the selectors for a known domain.

comment:4 Changed 7 months ago by kzar

  • Blocking 6428 added

comment:5 Changed 7 months ago by kzar

Well I think a meta issue is kinda overkill here and we've already had some other issues about improving the performance of this code. I don't think we need to cache selectors, since this code is already much faster than it was and will be faster still with your upcoming changes.

comment:6 Changed 7 months ago by mjethani

  • Description modified (diff)

comment:7 Changed 7 months ago by mjethani

Regarding caching, we could obviously cache the result of getConditionalGenericSelectors (similar to unconditionalSelectors now) since it's always the same every time it's called. It's also only about ~1,100 selectors with EasyList+AA.

If we cache that we can also use the same cached value for known domains that have no domain-specific exceptions for generic selectors, which should be most known domains. For such domains then ElemHide.getSelectorsForDomain will have the same performance as criteria = ElemHide.SPECIFIC_ONLY.

comment:8 Changed 7 months ago by abpbot

A commit referencing this issue has landed:
Issue 6652 - Do not push to unconditional selectors array

comment:9 Changed 7 months ago by kzar

  • Priority changed from Unknown to P2
  • Ready set

Fair enough, sorry I thought you were talking about caching the results of getSelectorForDomain which is something we decided against in the past.

comment:10 Changed 7 months ago by mjethani

  • Blocking 6665 added

comment:11 Changed 7 months ago by mjethani

  • Keywords circumvention added

Adding the circumvention tag for management and tracking purposes.

I just realized that the speedups here would be beneficial in fighting circumvention, especially for snippets filters where we need to inject the JavaScript before the HTML document has a chance to execute any of its own code (especially in the case of API wrappers), since we're using tabs.executeScript on the webNavigation.onCommitted event. I intend to call into ElemHide for looking up snippets filters eventually.

comment:12 Changed 7 months ago by mjethani

  • Description modified (diff)
  • Summary changed from Make further optimizations to ElemHide.getSelectorsForDomain to Implement fast selector lookups for most domains

comment:13 Changed 7 months ago by mjethani

  • Blocking 6667 added

comment:14 Changed 7 months ago by arthur

  • Cc arthur added

comment:15 Changed 7 months ago by kzar

  • Blocked By 6669 added

comment:16 Changed 7 months ago by mjethani

  • Blocking 6665 removed

comment:17 Changed 7 months ago by jsonesen

  • Cc jsonesen added

comment:18 Changed 6 months ago by jsonesen

@mjethani do you mind setting this to reviewing? makes it easier when querying for new issues to work on.

comment:19 Changed 6 months ago by mjethani

  • Status changed from new to reviewing

comment:20 Changed 3 months ago by mjethani

  • Blocking 6955 added

comment:21 Changed 4 weeks ago by erikvold

  • Cc erikvold added

comment:22 Changed 3 weeks ago by mjethani

  • Blocking 6428 removed
Note: See TracTickets for help on using tickets.