Opened 16 months ago

Closed 3 months ago

#6834 closed change (rejected)

Index filter objects by hash

Reported by: mjethani Assignee:
Priority: Unknown Milestone:
Module: Core Keywords: closed-in-favor-of-gitlab
Cc: Blocked By:
Blocking: #6729, #6831 Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29789581

Description (last modified by mjethani)

Background

When Adblock Plus is loaded with the default list of subscriptions, there are initially ~18.5 MB of internal string objects in the V8 heap. The first step towards reducing the memory consumption related to these string objects is to stop holding on to the filter text. All strings derived from the filter text are still "sliced strings" in V8, which means anything we do to "optimize" them right now (see #6710 and #6831) is only going to make it worse. Once we've lost the references to the filter text itself, then we can start optimizing the strings by "interning" them and reusing the interned strings (see #6729).

For example, take the following two filters:

foo.com,bar.com###foo
foo.com,bar.com###bar

This creates two strings and four "sliced strings" in V8 (two each for the domain and selector parts).

Our goal is to reuse "foo.com,bar.com###" so that we end up with three strings ("foo.com,bar.com###", "foo", and "bar") and two "concatenated strings" ("foo.com,bar.com###foo" and "foo.com,bar.com###bar"). In order for this to work, we need to lose any references to the original filter text. We should then generate the filter text on demand by concatenating the substrings whenever required (e.g. during serialization).

But first we should stop using the filter text as the key for the filter object cache.

One way to do this is to index the entries in the Filter.knownFilters cache of Filter instances by a numeric hash of the filter text (preferably a V8 "Smi") rather than the filter text itself.

SuperFastHash by Paul Hsieh, which is used by Chromium for various non-cryptographic purposes, appears to be a good candidate hash function for this purpose. In my test, out of ~82,000 filters in the default list of subscriptions, I got only one collision.

What to change

[TBD]

Change History (4)

comment:1 Changed 16 months ago by mjethani

  • Description modified (diff)

comment:2 Changed 16 months ago by mjethani

  • Summary changed from Index filter objets by hash to Index filter objects by hash

comment:3 Changed 16 months ago by mjethani

  • Blocking 6831 added
  • Component changed from Unknown to Core
  • Review URL(s) modified (diff)

comment:4 Changed 3 months ago by sebastian

  • Keywords closed-in-favor-of-gitlab added
  • Resolution set to rejected
  • Status changed from new to closed

Sorry, but we switched to GitLab. If this issue is still relevant, please file it again in the new issue tracker.

Note: See TracTickets for help on using tickets.