Opened 6 weeks ago

Closed 6 weeks ago

Last modified 6 weeks ago

#7265 closed change (fixed)

Orgnanize request blocking filters by domain

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: sebastian, hfiguiere Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/30000586/

Description (last modified by mjethani)

Background

In EasyList, out of ~41,000 request blocking filters, about 26,000 are not location-only (#6994). For these filters we must call RegExpFilter.prototype.matches, which checks that the filter applies on the document's domain using ActiveFilter.prototype.isActiveOnDomain. Out of these filters, about 3,000 (12%) apply only on specific domains. If we includes Acceptable Ads, the number of such filters is approximately 33,000, and some 6,700 (20%) of these apply on only specific domains.

If we organize these filters by domain as they are added to the matcher, as we do for element hiding filters, we could avoid the call to ActiveFilter.prototype.isActiveOnDomain altogether.

Impact on memory footprint and startup time

The information below is based on the patch.

In my tests, this change significantly speeds up filter matching, by around 10-15% (EasyList only).

The amount of time spent in the Matcher.prototype.add function goes up by ~60%, because of the additional processing for organize the filters into domain-specific maps, which affects startup time.

The initial memory footprint goes up from ~53 MB to ~62 MB (EasyList only). This in particular is a big regression. It should be possible to address this with some clever tricks in the construction of the Matcher.prototype._filterDomainMapsByKeyword data structure.

What to change

See patch.

Basically, in lib/matcher.js, organize filters by domain, by keyword, similar to the way it is done for element hiding filters in lib/elemHide.js.

A couple of optimizations:

  1. Avoid caching the domain map for RegExpFilter objects now since the domains property needs to be accessed only once (see #7046).
  2. There will be too many entries like Map { "foo" => Map { "" => Map { BlockingFilter => true } } } (keyword foo has one generic filter only). These can be reduced to Map { "foo" => Map { "" => BlockingFilter } } by making the RegExpFilter object pretend to be a map (duck typing) with itself as the key and true as the value.

Hints for testers

This change affects all filter matching for request blocking and request whitelisting filters.

In particular, make sure that domain-specific filters are working correctly. For example, ||foo.example.com/test^$domain=bar.example.com should block a request to https://foo.example.com/test from a document on bar.example.com but not from documents on other domains. It should also block the request if it's from a document on no.bar.example.com (subdomain of bar.example.com). If the filter is modified to ||foo.example.com/test^$domain=bar.example.com|~no.bar.example.com, however, the request should not be blocked if it's coming from a document on no.bar.example.com.

Change History (8)

comment:1 Changed 6 weeks ago by abpbot

A commit referencing this issue has landed:
Issue 7265 - Orgnanize request blocking filters by domain

comment:2 Changed 6 weeks ago by mjethani

  • Description modified (diff)
  • Owner set to mjethani
  • Priority changed from Unknown to P2
  • Ready set

comment:3 Changed 6 weeks ago by mjethani

  • Description modified (diff)

comment:4 Changed 6 weeks ago by mjethani

  • Description modified (diff)

comment:5 Changed 6 weeks ago by mjethani

  • Description modified (diff)

comment:6 Changed 6 weeks ago by mjethani

  • Description modified (diff)

comment:7 Changed 6 weeks ago by mjethani

  • Cc sebastian hfiguiere added
  • Resolution set to fixed
  • Status changed from new to closed

comment:8 Changed 6 weeks ago by mjethani

  • Description modified (diff)
Note: See TracTickets for help on using tickets.