Opened 8 months ago

Closed 8 months ago

Last modified 2 months ago

#7265 closed change (fixed)

Orgnanize request blocking filters by domain

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: sebastian, hfiguiere Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Ross Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/30000586/

Description (last modified by mjethani)

Background

In EasyList, out of ~41,000 request blocking filters, about 26,000 are not location-only (#6994). For these filters we must call RegExpFilter.prototype.matches, which checks that the filter applies on the document's domain using ActiveFilter.prototype.isActiveOnDomain. Out of these filters, about 3,000 (12%) apply only on specific domains. If we includes Acceptable Ads, the number of such filters is approximately 33,000, and some 6,700 (20%) of these apply on only specific domains.

If we organize these filters by domain as they are added to the matcher, as we do for element hiding filters, we could avoid the call to ActiveFilter.prototype.isActiveOnDomain altogether.

Impact on memory footprint and startup time

The information below is based on the patch.

In my tests, this change significantly speeds up filter matching, by around 10-15% (EasyList only).

The amount of time spent in the Matcher.prototype.add function goes up by ~60%, because of the additional processing for organize the filters into domain-specific maps, which affects startup time.

The initial memory footprint goes up from ~53 MB to ~62 MB (EasyList only). This in particular is a big regression. It should be possible to address this with some clever tricks in the construction of the Matcher.prototype._filterDomainMapsByKeyword data structure.

What to change

See patch.

Basically, in lib/matcher.js, organize filters by domain, by keyword, similar to the way it is done for element hiding filters in lib/elemHide.js.

A couple of optimizations:

  1. Avoid caching the domain map for RegExpFilter objects now since the domains property needs to be accessed only once (see #7046).
  2. There will be too many entries like Map { "foo" => Map { "" => Map { BlockingFilter => true } } } (keyword foo has one generic filter only). These can be reduced to Map { "foo" => Map { "" => BlockingFilter } } by making the RegExpFilter object pretend to be a map (duck typing) with itself as the key and true as the value.

Hints for testers

This change affects all filter matching for request blocking and request whitelisting filters.

In particular, make sure that domain-specific filters are working correctly. For example, ||foo.example.com/test^$domain=bar.example.com should block a request to https://foo.example.com/test from a document on bar.example.com but not from documents on other domains. It should also block the request if it's from a document on no.bar.example.com (subdomain of bar.example.com). If the filter is modified to ||foo.example.com/test^$domain=bar.example.com|~no.bar.example.com, however, the request should not be blocked if it's coming from a document on no.bar.example.com.

Change History (9)

comment:1 Changed 8 months ago by abpbot

A commit referencing this issue has landed:
Issue 7265 - Orgnanize request blocking filters by domain

comment:2 Changed 8 months ago by mjethani

  • Description modified (diff)
  • Owner set to mjethani
  • Priority changed from Unknown to P2
  • Ready set

comment:3 Changed 8 months ago by mjethani

  • Description modified (diff)

comment:4 Changed 8 months ago by mjethani

  • Description modified (diff)

comment:5 Changed 8 months ago by mjethani

  • Description modified (diff)

comment:6 Changed 8 months ago by mjethani

  • Description modified (diff)

comment:7 Changed 8 months ago by mjethani

  • Cc sebastian hfiguiere added
  • Resolution set to fixed
  • Status changed from new to closed

comment:8 Changed 8 months ago by mjethani

  • Description modified (diff)

comment:9 Changed 2 months ago by Ross

  • Tester changed from Unknown to Ross
  • Verified working set

Done. Working as described.

ABP 0.9.15.2340
Microsoft Edge 44.17763.1.0 / Windows 10 1809

ABP 3.5.2.2340
Chrome 49.0.2623.75 / Windows 10 1809
Chrome 75.0.3770.142 / Windows 10 1809
Opera 36.0.2130.65 / Windows 10 1809
Opera 62.0.3331.72 / Windows 10 1809
Firefox 51.0 / Windows 10 1809
Firefox 68.0 / Windows 10 1809
Firefox Mobile 68.0 / Android 7.2.2

Note: See TracTickets for help on using tickets.