Opened 11 months ago

Closed 4 months ago

#6815 closed change (fixed)

Avoid multiple copies of domain maps

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: kzar, hfiguiere, sergz, jsonesen, erikvold Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29841558/

Description (last modified by mjethani)

Background

While trying to come up with ways to reduce Adblock Plus's memory footprint, I found that a lot of filters have identical lists of domains. We create a separate Map object per filter, but we could instead reuse the same Map object in multiple filters. After some testing with EasyList+AA (default subscriptions), I found that the number of Map objects can be reduced from ~21,000 to ~2,800. This reduces the initial heap size by ~1.5 MB, but more importantly it reduces the final heap size, after all the filters' domain sources have been parsed, by close to 25 MB (from ~85 MB to ~60 MB, almost a 30% reduction)! It also speeds up the domains getter in the ActiveFilter class about 2.5x. This is the rare case where a change is good for both memory and performance.

What to change

Maintain a global knownDomainMaps object of type Map in lib/filterClasses.js. Before parsing a filter's domain source, first look up knownDomainMaps to see if there is a map for the domain source already present. If no map is present, only then parse the domain source and then add the newly created map to knownDomainMaps.

Hints for testers

This change is a replacement for #6727 and basically undoes that change.

Make sure that domain matching still works correctly, for both blocking and hiding filters, with domain exclusions (example.com,~www.example.com##.foo) and exceptions (example.com#@#.foo), for no-domain, single-domain, and multi-domain cases.

For example, example.com,~www.example.com##.foo should apply on https://example.com/ but not on https://www.example.com/ or if there are two filters ##.foo and www.example.com#@#.foo then the first one should apply on https://example.com/ but not on https://www.example.com/

Change History (15)

comment:1 Changed 11 months ago by mjethani

I ran some analysis on knownDomainMaps, as implemented in the initial patch but with each Map object having an additional filters property that keeps track of all the filters to which the map belongs.

[...knownDomainMaps.values()]
.filter(m => m.size >= 2 && m.filters.length > 1)
.sort((m1, m2) => m1.size * (m1.filters.length - 1) < m2.size * (m2.filters.length - 1) ? 1 : -1)
.reduce((a, m) => a += m.size * (m.filters.length - 1), 0)

This gives us the total number of extra map entries that we saved. The number we get here is 281,673.

Now let's filter knownDomainMaps so that only maps with 10 or more entries are included:

[...knownDomainMaps.values()]
.filter(m => m.size >= 10 && m.filters.length > 1)
.sort((m1, m2) => m1.size * (m1.filters.length - 1) < m2.size * (m2.filters.length - 1) ? 1 : -1)
.reduce((a, m) => a += m.size * (m.filters.length - 1), 0)

Now the number we get is 278,904.

Let's try with 100 or more entries:

[...knownDomainMaps.values()]
.filter(m => m.size >= 100 && m.filters.length > 1)
.sort((m1, m2) => m1.size * (m1.filters.length - 1) < m2.size * (m2.filters.length - 1) ? 1 : -1)
.reduce((a, m) => a += m.size * (m.filters.length - 1), 0)

Now the number is 270,991.

Clearly we're seeing diminishing returns here as the number of domains to a filter decreases.

There are 81,771 filters, and in the initial patch we're running the deduplication for all filters having more than one domain, which is 20,835 filters. What if we ran deduplication for only those filters that have 10 or more domains? Let's find out how many such filters exist:

[...Filter.knownFilters.values()].filter(f => f.domains && f.domains.size >= 10)

Out of 81,771, only 1,213 filters have 10 or more domains.

What about filters with 100 or more domains?

[...Filter.knownFilters.values()].filter(f => f.domains && f.domains.size >= 100)

Only 496 filters!

We could avoid ~20,000 lookups in knownDomainMaps and still get ~96% of the improvement in memory consumption by targeting only those filters that have 100 or more domains.

I'm going to try this out in the next patch.

comment:2 Changed 11 months ago by mjethani

Alright, so I tried the above (with both domains.size >= 10 and domains.size >= 100), but it doesn't help with the performance. In fact, it makes it slightly worse. This is not surprising: we're not gaining much by avoiding the map lookups, and we lose some because of the duplicate parsing even for cases where the filter has less than 10 domains.

The original solution is simpler and more optimal.

comment:3 Changed 11 months ago by mjethani

  • Description modified (diff)

comment:4 Changed 11 months ago by mjethani

  • Status changed from new to reviewing

comment:5 Changed 11 months ago by abpbot

A commit referencing this issue has landed:
Issue 6815 - Avoid multiple copies of domain maps

comment:6 Changed 11 months ago by mjethani

  • Description modified (diff)

comment:7 Changed 11 months ago by mjethani

  • Description modified (diff)

comment:8 Changed 11 months ago by mjethani

  • Description modified (diff)

comment:9 Changed 11 months ago by mjethani

  • Description modified (diff)

comment:10 Changed 11 months ago by mjethani

  • Description modified (diff)

comment:11 Changed 9 months ago by mjethani

  • Blocking 7000 added

comment:12 in reply to: ↑ description ; follow-up: Changed 9 months ago by mjethani

Replying to mjethani:

This reduces the initial heap size by ~1.5 MB, but more importantly it reduces the final heap size, after all the filters' domain sources have been parsed, by close to 25 MB (from ~85 MB to ~60 MB, almost a 30% reduction)!

Just for some perspective on how far we have come with all the changes (#7000), I said in the original issue description that this reduces the heap size to ~60 MB. I don't remember if I was referring to the total heap or the used heap, but let's say it was the former. By comparison, the current total heap size in the next branch is ~44 MB.

comment:13 in reply to: ↑ 12 Changed 9 months ago by mjethani

Replying to mjethani:

Replying to mjethani:

This reduces the initial heap size by ~1.5 MB, but more importantly it reduces the final heap size, after all the filters' domain sources have been parsed, by close to 25 MB (from ~85 MB to ~60 MB, almost a 30% reduction)!

Just for some perspective on how far we have come with all the changes (#7000), I said in the original issue description that this reduces the heap size to ~60 MB. I don't remember if I was referring to the total heap or the used heap, but let's say it was the former. By comparison, the current total heap size in the next branch is ~44 MB.

Sorry, I must correct myself here. First of all, the heap size being referred to in the issue description is after calling the domains getter for each Filter object at least once, which allocates all the Map objects for the domains.

The current heap size in that case is ~48 MB.

comment:14 Changed 7 months ago by erikvold

  • Cc erikvold added

comment:15 Changed 4 months ago by mjethani

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.