Opened 4 years ago

Closed 2 years ago

Last modified 2 years ago

#3673 closed change (fixed)

[abp2blocklist] Merge closely matching rules

Reported by: sebastian Assignee: mjethani
Priority: P3 Milestone:
Module: Platform Keywords:
Cc: kzar, CraftyDeano, hfiguiere, mjethani, arthur Blocked By:
Blocking: #5236, #5464 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29426594/
https://codereview.adblockplus.org/29501568/

Description (last modified by mjethani)

Background

With #3585 we significantly reduced the amount of generated blocking rules. So that EasyList + Acceptable Ads result into just 38,561 rules, now. However, as these filter lists are continuously growing, and as we want to offer more combinations of filter lists in the future, this isn't enough yet. For reference, the maximum number of rules as of iOS/Safari 9 is 50K rules.

But there is another way to further compress the generated rules. There are for example filters like these two in EasyList:

-ad-util-
-ad-util.

These filters currently result into two content blocker rules. However, instead, we could compress rules that only differ in a single character, into a single rule like below:

{
  "trigger": {
    "url-filter": "^https?://.*-ad-util[.\-]",
    ...
  },
  ...
}

Two filters that differ in only in one contiguous range of characters could also be merged into one. For example, ad and adserver could be merged into ^https?://.*ad(server)?.

What to change

Combine rules that are only one edit operation apart into a single rule using character sets, groups, and quantifiers.

example filters resulting regular expression
foo1, foo2 ^https?://.*foo[12]
foo1, foo ^https?://.*foo1?
foo1, foo2, foo ^https?://.*foo[12]?
foo12, foo ^https?://.*foo(12)?
foo1.com##.ad, foo2.com##.ad ^https?://([^/:]*\\.)?foo[12]\.com[/:]

Change History (16)

comment:1 Changed 4 years ago by sebastian

  • Description modified (diff)

comment:2 Changed 4 years ago by kzar

  • Owner set to kzar

comment:3 Changed 4 years ago by kzar

I've not had much time to look at this so far and also it's pretty tricky. Anyway I have made a bit of progress today.

Compressing only rules with a single character url-filter substitution once I reduced EasyList down from 32376 to 30882 rules, saving just under 1500. Unfortunately though this slowed the script down substantially, but I've not yet look at optimising things.

I'm beginning to think this compression should be optional so that we can use it for the blocking lists generated on our servers but not when the lists are generated on the client side.

Last edited 4 years ago by kzar (previous) (diff)

comment:4 Changed 4 years ago by sebastian

I happily have a look at that code. But yeah, if in the end we cannot get rid of more than 1.5K rules while impairing performance notably, this would be pointless. Incosistent behavior between the pre-generated lists on the server and those generated by Adblock Plus for Safari would IMO be a bad idea.

comment:5 Changed 4 years ago by kzar

Yea, this was just my initial try so I'm hoping I can get it faster. There are also plenty of things it doesn't compress yet. (So far foo1, foo2, foo results in ^https?://.*foo[12] and ^https?://.*foo for example instead of ^https?://.*foo[12]?.)

I will continue to hack away at it when I get time and keep you guys updated. I'm still hopeful we can make some useful improvement with this approach.

Incosistent behavior between the pre-generated lists on the server and those generated by Adblock Plus for Safari would IMO be a bad idea.

Fair point, I think you're right this could cause problems.

comment:6 Changed 3 years ago by kzar

  • Cc hfiguiere mjethani added
  • Owner kzar deleted

Unassigning myself from this for now. It's a pretty interesting one if you guys are bored.

comment:7 Changed 3 years ago by mjethani

  • Owner set to mjethani

Thanks, I'll give it a shot

comment:8 Changed 3 years ago by mjethani

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:9 Changed 3 years ago by mjethani

  • Description modified (diff)
  • Summary changed from [abp2blocklist] Merge rules that only differ in a single character to [abp2blocklist] Merge closely matching rules

Updated title and description to reflect expanded scope.

comment:10 Changed 3 years ago by mjethani

  • Blocking 5236 added

comment:11 Changed 3 years ago by arthur

  • Cc arthur added

comment:12 Changed 2 years ago by abpbot

A commit referencing this issue has landed:
Issue 3673 - Merge closely matching rules

comment:13 Changed 2 years ago by abpbot

A commit referencing this issue has landed:
Issue 3673 - Use separate function to chain promises

comment:14 Changed 2 years ago by mjethani

  • Blocking 5464 added

comment:15 Changed 2 years ago by mjethani

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:16 Changed 2 years ago by mjethani

  • Review URL(s) modified (diff)
Note: See TracTickets for help on using tickets.