Opened on 07/29/2018 at 03:17:27 PM

Closed on 03/04/2019 at 12:11:07 PM

Last modified on 07/25/2019 at 07:48:23 PM

#6815 closed change (fixed)

Avoid multiple copies of domain maps

Reported by: mjethani Assignee: mjethani
Priority: P2 Milestone:
Module: Core Keywords:
Cc: kzar, hfiguiere, sergz, jsonesen, erikvold Blocked By:
Blocking: #7000 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Ross Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29841558/

Description (last modified by mjethani)

Background

While trying to come up with ways to reduce Adblock Plus's memory footprint, I found that a lot of filters have identical lists of domains. We create a separate Map object per filter, but we could instead reuse the same Map object in multiple filters. After some testing with EasyList+AA (default subscriptions), I found that the number of Map objects can be reduced from ~21,000 to ~2,800. This reduces the initial heap size by ~1.5 MB, but more importantly it reduces the final heap size, after all the filters' domain sources have been parsed, by close to 25 MB (from ~85 MB to ~60 MB, almost a 30% reduction)! It also speeds up the domains getter in the ActiveFilter class about 2.5x. This is the rare case where a change is good for both memory and performance.

What to change

Maintain a global knownDomainMaps object of type Map in lib/filterClasses.js. Before parsing a filter's domain source, first look up knownDomainMaps to see if there is a map for the domain source already present. If no map is present, only then parse the domain source and then add the newly created map to knownDomainMaps.

Hints for testers

This change is a replacement for #6727 and basically undoes that change.

Make sure that domain matching still works correctly, for both blocking and hiding filters, with domain exclusions (example.com,~www.example.com##.foo) and exceptions (example.com#@#.foo), for no-domain, single-domain, and multi-domain cases.

For example, example.com,~www.example.com##.foo should apply on https://example.com/ but not on https://www.example.com/ or if there are two filters ##.foo and www.example.com#@#.foo then the first one should apply on https://example.com/ but not on https://www.example.com/

Attachments (0)

Change History (16)

comment:1 Changed on 07/29/2018 at 10:27:58 PM by mjethani

I ran some analysis on knownDomainMaps, as implemented in the initial patch but with each Map object having an additional filters property that keeps track of all the filters to which the map belongs.

[...knownDomainMaps.values()]
.filter(m => m.size >= 2 && m.filters.length > 1)
.sort((m1, m2) => m1.size * (m1.filters.length - 1) < m2.size * (m2.filters.length - 1) ? 1 : -1)
.reduce((a, m) => a += m.size * (m.filters.length - 1), 0)

This gives us the total number of extra map entries that we saved. The number we get here is 281,673.

Now let's filter knownDomainMaps so that only maps with 10 or more entries are included:

[...knownDomainMaps.values()]
.filter(m => m.size >= 10 && m.filters.length > 1)
.sort((m1, m2) => m1.size * (m1.filters.length - 1) < m2.size * (m2.filters.length - 1) ? 1 : -1)
.reduce((a, m) => a += m.size * (m.filters.length - 1), 0)

Now the number we get is 278,904.

Let's try with 100 or more entries:

[...knownDomainMaps.values()]
.filter(m => m.size >= 100 && m.filters.length > 1)
.sort((m1, m2) => m1.size * (m1.filters.length - 1) < m2.size * (m2.filters.length - 1) ? 1 : -1)
.reduce((a, m) => a += m.size * (m.filters.length - 1), 0)

Now the number is 270,991.

Clearly we're seeing diminishing returns here as the number of domains to a filter decreases.

There are 81,771 filters, and in the initial patch we're running the deduplication for all filters having more than one domain, which is 20,835 filters. What if we ran deduplication for only those filters that have 10 or more domains? Let's find out how many such filters exist:

[...Filter.knownFilters.values()].filter(f => f.domains && f.domains.size >= 10)

Out of 81,771, only 1,213 filters have 10 or more domains.

What about filters with 100 or more domains?

[...Filter.knownFilters.values()].filter(f => f.domains && f.domains.size >= 100)

Only 496 filters!

We could avoid ~20,000 lookups in knownDomainMaps and still get ~96% of the improvement in memory consumption by targeting only those filters that have 100 or more domains.

I'm going to try this out in the next patch.

comment:2 Changed on 07/29/2018 at 10:53:44 PM by mjethani

Alright, so I tried the above (with both domains.size >= 10 and domains.size >= 100), but it doesn't help with the performance. In fact, it makes it slightly worse. This is not surprising: we're not gaining much by avoiding the map lookups, and we lose some because of the duplicate parsing even for cases where the filter has less than 10 domains.

The original solution is simpler and more optimal.

comment:3 Changed on 07/30/2018 at 09:53:45 AM by mjethani

  • Description modified (diff)

comment:4 Changed on 07/30/2018 at 09:53:54 AM by mjethani

  • Status changed from new to reviewing

comment:5 Changed on 08/01/2018 at 07:03:58 PM by abpbot

A commit referencing this issue has landed:
Issue 6815 - Avoid multiple copies of domain maps

comment:6 Changed on 08/02/2018 at 07:51:50 AM by mjethani

  • Description modified (diff)

comment:7 Changed on 08/02/2018 at 08:27:15 AM by mjethani

  • Description modified (diff)

comment:8 Changed on 08/02/2018 at 10:06:03 AM by mjethani

  • Description modified (diff)

comment:9 Changed on 08/02/2018 at 10:06:26 AM by mjethani

  • Description modified (diff)

comment:10 Changed on 08/02/2018 at 01:34:15 PM by mjethani

  • Description modified (diff)

comment:11 Changed on 09/30/2018 at 09:42:04 AM by mjethani

  • Blocking 7000 added

comment:12 in reply to: ↑ description ; follow-up: Changed on 10/04/2018 at 04:01:49 PM by mjethani

Replying to mjethani:

This reduces the initial heap size by ~1.5 MB, but more importantly it reduces the final heap size, after all the filters' domain sources have been parsed, by close to 25 MB (from ~85 MB to ~60 MB, almost a 30% reduction)!

Just for some perspective on how far we have come with all the changes (#7000), I said in the original issue description that this reduces the heap size to ~60 MB. I don't remember if I was referring to the total heap or the used heap, but let's say it was the former. By comparison, the current total heap size in the next branch is ~44 MB.

comment:13 in reply to: ↑ 12 Changed on 10/04/2018 at 05:42:56 PM by mjethani

Replying to mjethani:

Replying to mjethani:

This reduces the initial heap size by ~1.5 MB, but more importantly it reduces the final heap size, after all the filters' domain sources have been parsed, by close to 25 MB (from ~85 MB to ~60 MB, almost a 30% reduction)!

Just for some perspective on how far we have come with all the changes (#7000), I said in the original issue description that this reduces the heap size to ~60 MB. I don't remember if I was referring to the total heap or the used heap, but let's say it was the former. By comparison, the current total heap size in the next branch is ~44 MB.

Sorry, I must correct myself here. First of all, the heap size being referred to in the issue description is after calling the domains getter for each Filter object at least once, which allocates all the Map objects for the domains.

The current heap size in that case is ~48 MB.

comment:14 Changed on 11/15/2018 at 04:26:22 AM by erikvold

  • Cc erikvold added

comment:15 Changed on 03/04/2019 at 12:11:07 PM by mjethani

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:16 Changed on 07/25/2019 at 07:48:23 PM by Ross

  • Tester changed from Unknown to Ross
  • Verified working set

Done. Domain matching still works as expected.

ABP 0.9.15.2340
Microsoft Edge 44.17763.1.0 / Windows 10 1809

ABP 3.5.2.2340
Chrome 49.0.2623.75 / Windows 10 1809
Chrome 75.0.3770.142 / Windows 10 1809
Opera 36.0.2130.65 / Windows 10 1809
Opera 62.0.3331.72 / Windows 10 1809
Firefox 51.0 / Windows 10 1809
Firefox 68.0 / Windows 10 1809
Firefox Mobile 68.0 / Android 7.2.2

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from mjethani.
 
Note: See TracTickets for help on using tickets.