Opened on 02/19/2016 at 08:44:59 PM

Closed on 08/03/2017 at 01:18:39 PM

Last modified on 08/03/2017 at 01:19:19 PM

#3673 closed change (fixed)

[abp2blocklist] Merge closely matching rules

Reported by: sebastian Assignee: mjethani
Priority: P3 Milestone:
Module: Platform Keywords:
Cc: kzar, CraftyDeano, hfiguiere, mjethani, arthur Blocked By:
Blocking: #5236, #5464 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29426594/
https://codereview.adblockplus.org/29501568/

Description (last modified by mjethani)

Background

With #3585 we significantly reduced the amount of generated blocking rules. So that EasyList + Acceptable Ads result into just 38,561 rules, now. However, as these filter lists are continuously growing, and as we want to offer more combinations of filter lists in the future, this isn't enough yet. For reference, the maximum number of rules as of iOS/Safari 9 is 50K rules.

But there is another way to further compress the generated rules. There are for example filters like these two in EasyList:

-ad-util-
-ad-util.

These filters currently result into two content blocker rules. However, instead, we could compress rules that only differ in a single character, into a single rule like below:

{
  "trigger": {
    "url-filter": "^https?://.*-ad-util[.\-]",
    ...
  },
  ...
}

Two filters that differ in only in one contiguous range of characters could also be merged into one. For example, ad and adserver could be merged into ^https?://.*ad(server)?.

What to change

Combine rules that are only one edit operation apart into a single rule using character sets, groups, and quantifiers.

example filters resulting regular expression
foo1, foo2 ^https?://.*foo[12]
foo1, foo ^https?://.*foo1?
foo1, foo2, foo ^https?://.*foo[12]?
foo12, foo ^https?://.*foo(12)?
foo1.com##.ad, foo2.com##.ad ^https?://([^/:]*\\.)?foo[12]\.com[/:]

Attachments (0)

Change History (16)

comment:1 Changed on 02/19/2016 at 09:22:40 PM by sebastian

  • Description modified (diff)

comment:2 Changed on 02/21/2016 at 01:51:55 PM by kzar

  • Owner set to kzar

comment:3 Changed on 05/04/2016 at 03:27:53 PM by kzar

I've not had much time to look at this so far and also it's pretty tricky. Anyway I have made a bit of progress today.

Compressing only rules with a single character url-filter substitution once I reduced EasyList down from 32376 to 30882 rules, saving just under 1500. Unfortunately though this slowed the script down substantially, but I've not yet look at optimising things.

I'm beginning to think this compression should be optional so that we can use it for the blocking lists generated on our servers but not when the lists are generated on the client side.

Last edited on 05/04/2016 at 03:29:15 PM by kzar

comment:4 Changed on 05/04/2016 at 03:58:20 PM by sebastian

I happily have a look at that code. But yeah, if in the end we cannot get rid of more than 1.5K rules while impairing performance notably, this would be pointless. Incosistent behavior between the pre-generated lists on the server and those generated by Adblock Plus for Safari would IMO be a bad idea.

comment:5 Changed on 05/04/2016 at 04:35:21 PM by kzar

Yea, this was just my initial try so I'm hoping I can get it faster. There are also plenty of things it doesn't compress yet. (So far foo1, foo2, foo results in ^https?://.*foo[12] and ^https?://.*foo for example instead of ^https?://.*foo[12]?.)

I will continue to hack away at it when I get time and keep you guys updated. I'm still hopeful we can make some useful improvement with this approach.

Incosistent behavior between the pre-generated lists on the server and those generated by Adblock Plus for Safari would IMO be a bad idea.

Fair point, I think you're right this could cause problems.

comment:6 Changed on 04/28/2017 at 09:21:11 AM by kzar

  • Cc hfiguiere mjethani added
  • Owner kzar deleted

Unassigning myself from this for now. It's a pretty interesting one if you guys are bored.

comment:7 Changed on 04/28/2017 at 11:40:14 AM by mjethani

  • Owner set to mjethani

Thanks, I'll give it a shot

comment:8 Changed on 05/01/2017 at 07:42:49 PM by mjethani

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:9 Changed on 05/08/2017 at 11:26:30 PM by mjethani

  • Description modified (diff)
  • Summary changed from [abp2blocklist] Merge rules that only differ in a single character to [abp2blocklist] Merge closely matching rules

Updated title and description to reflect expanded scope.

comment:10 Changed on 05/11/2017 at 07:57:13 PM by mjethani

  • Blocking 5236 added

comment:11 Changed on 05/16/2017 at 12:00:14 PM by arthur

  • Cc arthur added

comment:12 Changed on 07/31/2017 at 07:25:47 AM by abpbot

A commit referencing this issue has landed:
Issue 3673 - Merge closely matching rules

comment:13 Changed on 08/01/2017 at 07:02:08 AM by abpbot

A commit referencing this issue has landed:
Issue 3673 - Use separate function to chain promises

comment:14 Changed on 08/02/2017 at 06:30:08 AM by mjethani

  • Blocking 5464 added

comment:15 Changed on 08/03/2017 at 01:18:39 PM by mjethani

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:16 Changed on 08/03/2017 at 01:19:19 PM by mjethani

  • Review URL(s) modified (diff)

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from mjethani.
 
Note: See TracTickets for help on using tickets.