Opened on 08/03/2018 at 08:49:40 AM
Closed on 08/29/2019 at 05:43:52 PM
#6834 closed change (rejected)
Index filter objects by hash
Reported by: | mjethani | Assignee: | |
---|---|---|---|
Priority: | Unknown | Milestone: | |
Module: | Core | Keywords: | closed-in-favor-of-gitlab |
Cc: | Blocked By: | ||
Blocking: | #6729, #6831 | Platform: | Unknown / Cross platform |
Ready: | no | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by mjethani)
Background
When Adblock Plus is loaded with the default list of subscriptions, there are initially ~18.5 MB of internal string objects in the V8 heap. The first step towards reducing the memory consumption related to these string objects is to stop holding on to the filter text. All strings derived from the filter text are still "sliced strings" in V8, which means anything we do to "optimize" them right now (see #6710 and #6831) is only going to make it worse. Once we've lost the references to the filter text itself, then we can start optimizing the strings by "interning" them and reusing the interned strings (see #6729).
For example, take the following two filters:
foo.com,bar.com###foo foo.com,bar.com###bar
This creates two strings and four "sliced strings" in V8 (two each for the domain and selector parts).
Our goal is to reuse "foo.com,bar.com###" so that we end up with three strings ("foo.com,bar.com###", "foo", and "bar") and two "concatenated strings" ("foo.com,bar.com###foo" and "foo.com,bar.com###bar"). In order for this to work, we need to lose any references to the original filter text. We should then generate the filter text on demand by concatenating the substrings whenever required (e.g. during serialization).
But first we should stop using the filter text as the key for the filter object cache.
One way to do this is to index the entries in the Filter.knownFilters cache of Filter instances by a numeric hash of the filter text (preferably a V8 "Smi") rather than the filter text itself.
SuperFastHash by Paul Hsieh, which is used by Chromium for various non-cryptographic purposes, appears to be a good candidate hash function for this purpose. In my test, out of ~82,000 filters in the default list of subscriptions, I got only one collision.
What to change
[TBD]
Attachments (0)
Change History (4)
comment:2 Changed on 08/03/2018 at 09:05:10 AM by mjethani
- Summary changed from Index filter objets by hash to Index filter objects by hash
comment:3 Changed on 08/03/2018 at 09:07:18 AM by mjethani
- Blocking 6831 added
- Component changed from Unknown to Core
- Review URL(s) modified (diff)
comment:4 Changed on 08/29/2019 at 05:43:52 PM by sebastian
- Keywords closed-in-favor-of-gitlab added
- Resolution set to rejected
- Status changed from new to closed
Sorry, but we switched to GitLab. If this issue is still relevant, please file it again in the new issue tracker.