Opened 8 months ago

Closed 8 weeks ago

#7318 closed change (rejected)

Unsupport $match-case filter option

Reported by: sebastian Assignee:
Priority: Unknown Milestone:
Module: Core Keywords: closed-in-favor-of-gitlab
Cc: mjethani, arthur, Lain_13, dimisa, fanboy, kzar, amr, SMed79, greiner, sergz Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description

Background

It turned out that the $match-case filter option (marking the filter's URL pattern case sensitive) is vastly unused with no single filter using it at the moment in EasyList, EasyList Germany, Liste FR, RuAdList, EasyPrivacy, Fanboy's Annoyance List and Fanboy's Social Blocking List.

Unsupportig this unused filter option would allow us to simplify the filter parser and matching logic, and might potentially allow some further optimizations.

What to change

Remove support for the $match-case filter option and any related logic.

Change History (19)

comment:1 Changed 8 months ago by sebastian

@arthur, Lain_13, dimisa, fanboy: Any objections? Feel free to add other filter list authors.

@mjethani: I'm wondering whether, if we go ahead with this change, whether it would be more efficient to convert the URL before matching to lower case in order to remove the i flag from the regular expressions (and related logic from matchesLocation())?

comment:2 follow-up: Changed 8 months ago by mjethani

@sebastian yes, I suspect that it is more efficient to do a case-sensitive match (which means we could convert the request URL and all the patterns to lowercase). We are already doing this for filters that do not use a RegExp object internally, which is in fact most request blocking filters.

I actually think that $match-case (rather than $~match-case) should have been the default from the beginning and that's how it should work, but of course if we change the default now it might cause some filters to stop working.

comment:3 Changed 8 months ago by mjethani

  • Cc kzar amr added

comment:4 Changed 8 months ago by fanboy

I can't recall if I ever used $match-case, no issues here removing it

comment:5 Changed 8 months ago by arthur

  • Cc SMed79 added

Agreeing with fanboy

comment:6 Changed 8 months ago by dimisa

I have never experienced the need to use this option. If it does not exist, nothing will change for me.

comment:7 Changed 8 months ago by kzar

I'm not sure this is a good idea, while Adblock Plus filters default to case-insensitive matching, chrome.declarativeNetRequest rules default to case sensitive matching. I asked Karen about it, and he said it's better to make rules case sensitive wherever possible since they're more efficient.

While we can mark some rules as case sensitive automatically (ones which only contain the domain part), most we can't. If we remove $match-case we'll loose manual control as well.

It's true, that we don't seem to use the option much at the moment, but it seems to me this might be a bad time to remove it. It sounds like, for better or worse, we're going to be working with chrome.declarativeNetRequest a lot going forwards.

comment:8 follow-up: Changed 8 months ago by dimisa

In my opinion, the presence of this option does not affect the effectiveness of Adblock Plus, in contrast to:
https://issues.adblockplus.org/ticket/7236
https://issues.adblockplus.org/ticket/7119
https://issues.adblockplus.org/ticket/756

comment:9 in reply to: ↑ 8 Changed 8 months ago by kzar

Replying to dimisa:

In my opinion, the presence of this option does not affect the effectiveness of Adblock Plus, in contrast to...

I appreciate your point, but please keep on topic. The discussion here should be about the change proposed in this issue.

comment:10 in reply to: ↑ 2 ; follow-up: Changed 8 months ago by mjethani

Replying to mjethani:

@sebastian yes, I suspect that it is more efficient to do a case-sensitive match [...]

Indeed this is true, but this also does not depend on this ticket. I have filed #7321 and submitted a patch.

comment:11 in reply to: ↑ 10 Changed 8 months ago by mjethani

Replying to mjethani:

Replying to mjethani:

@sebastian yes, I suspect that it is more efficient to do a case-sensitive match [...]

Indeed this is true, but this also does not depend on this ticket. I have filed #7321 and submitted a patch.

To clarify what I'm saying, with #7321 we are already getting almost all of the performance benefit of the change suggested here, but without losing the $match-case option. Performance should no longer be a reason for making this change.

Given Dave's comment above, I am now inclined towards keeping this option until we have clear direction about declarativeNetRequest.

comment:12 Changed 8 months ago by Lain_13

I think this flag could be safely removed if filters and matched urls would be converted to the same case before matching. Actual case-sensitiveness in filters usually never required and detremental for generic filters. The only case in which might be needed is when site intentionally attempting to circumvent blocking and for this case we still have regexp filters. Just make sure that regexp fileters are still matched against original URL (not all lower/upper case).

comment:13 Changed 8 months ago by greiner

  • Cc greiner added

comment:14 follow-ups: Changed 8 months ago by sebastian

If we have to use delcarativeNetRequest in the future, and if they don't mange to make case-insensitive patterns perform well, with Dave's line of argument it seems we should rather make ABP filter case-sensitive by default as well, and then introduce a $match-case-insensitive option (so that filter list authors only use case-insensitive matching where necessary). Either way, the $match-case option as currently supported seems redundant.

comment:15 Changed 8 months ago by mjethani

  • Cc sergz added

comment:16 in reply to: ↑ 14 Changed 8 months ago by Lain_13

Replying to sebastian:

and then introduce a $match-case-insensitive option

Why? I mean, if all URLs and filters would be cast to lower case matching will act like case-insensitive while being case-sensitive. We do need case-insensitive matching by default. It just doesn't matter how exactly it will be achieved under the hood.

comment:17 in reply to: ↑ 14 ; follow-up: Changed 8 months ago by kzar

Replying to sebastian:

...we should rather make ABP filter case-sensitive by default as well, and then introduce a $match-case-insensitive option (so that filter list authors only use case-insensitive matching where necessary).

I agree, but wouldn't that break loads of the existing filters, which were written with case-insensitive matching in mind?

comment:18 in reply to: ↑ 17 Changed 8 months ago by sebastian

Replying to Lain_13:

Why? I mean, if all URLs and filters would be cast to lower case matching will act like case-insensitive while being case-sensitive. We do need case-insensitive matching by default. It just doesn't matter how exactly it will be achieved under the hood.

Case-insensitive matching by default is convenient, as there is apparently no single filter at the moment in any relevant filter list that requires case-sensitive matching. But these semantics aren't required per-se. About 50% of the request filters in EasyList have the same effect regardless whether they are matched with case-insensitive semantics (as they don't match any alphabetical characters after the domain part), and likely a fair amount of the remaining filters (that might match alphabetical charterers in the path and/or query part of the URL) would still match in practice with case-sensitive matching.

I agree that changing the default semantics to case-sensitive, and only marking filters for case-insensitive matching if required in order to match all targeted requests, would make things more difficult for filter list authors. With the current implementation of Adblock Plus there is no need for that, as we can efficiently match case-insensitively (by converting the URL to lower case upfront, rather than checking each character for two variants), but if we have to adopt Chromium's declarativeNetRequest mechanism, we are no longer in control of the algorithms matching filters against URLs, and their current implementation doesn't seem to perform well with case-insensitive patterns.

Replying to kzar:

I agree, but wouldn't that break loads of the existing filters, which were written with case-insensitive matching in mind?

We could provide a script that converts existing filter lists, adding $match-case-insensitive to all filters that might match alphabetic characters in the path or query part of the URL, and then advice filter list authors to use that filter option only if absolutely necessary in new filters going forward. Also FWIW, there are probably larger breaking changes that will come with declarativeNetRequest.

We might as well find that the performance impact of case-insensitive matching with declarativeNetRequest isn't big enough to justify this change, in particular if 50% of the filters can automatically be flagged for case-sensitive matching and/or if the algorithms in Chromium will improve.

In any case, I don't see much of a point to keep supporting the $match-case filter option with its current semantics. It's unrealistic to expect filter list authors to explicitly add $~match-case to every filter that doesn't strictly require case-insensitive matching for better performance.

Last edited 8 months ago by sebastian (previous) (diff)

comment:19 Changed 8 weeks ago by sebastian

  • Keywords closed-in-favor-of-gitlab added
  • Resolution set to rejected
  • Status changed from new to closed

Sorry, but we switched to GitLab. If this issue is still relevant, please file it again in the new issue tracker.

Note: See TracTickets for help on using tickets.