Opened 17 months ago

Last modified 13 months ago

#7003 closed change

Implement alternative filter matching algorithm returning a boolean value — at Version 6

Reported by: mjethani Assignee:
Priority: Unknown Milestone:
Module: Core Keywords:
Cc: sergz, greiner, sebastian, fhd Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29896562/
https://codereview.adblockplus.org/29932584/

Description

Background

The current filter matching algorithm in the matchesAnyInternal method of CombinedMatcher looks like this (pseudocode):

let blockingFilter = null;

for (let keyword of extractKeywords(location))
{
  let whitelistFilter = this.matchesAnyWhitelistFilter(keyword, location);
  if (whitelistFilter)
    return whitelistFilter;

  if (!blockingFilter)
    blockingFilter = this.matchesAnyBlockingFilter(keyword, location);
}

return blockingFilter;

The key insight here is:

  1. The function goes through each whitelist filter for each keyword, even if there is no blocking filter
  2. Most URLs don't have a blocking filter
  3. Any ad blocker is primarily interested in knowing whether or not a URL should be blocked, not whether or not the URL has any whitelist filters

An alternative version of this function would look like this (pseudocode):

let whitelistFilter = null;
let blockingFilter = null;

for (let keyword of extractKeywords(location))
{
  blockingFilter = this.matchesAnyBlockingFilter(keyword, location);
  if (blockingFilter)
    break;
}

if (blockingFilter)
{
  for (let keyword of extractKeywords(location))
  {
    whitelistFilter = this.matchesAnyWhitelistFilter(keyword, location);
    if (whitelistFilter)
      break;
  }
}

return whitelistFilter || blockingFilter;

To put it in words, if the URL has no blocking filters, it wouldn't look up any whitelist filter for the URL. In such a case (which is the most common case), it would simply return null.

The extension is not really interested in whitelist filters unless DevTools is open. It simply wants to know whether a URL should be blocked or not. Technically, a whitelist filter should be considered a "hit" only if it actually prevented a URL from being blocked.

[TBD: What to do]

What to change

[TBD]

Change History (6)

comment:1 Changed 17 months ago by mjethani

  • Cc sergz greiner sebastian added

comment:2 Changed 17 months ago by mjethani

Any thoughts?

comment:3 Changed 17 months ago by sebastian

It seems pretty much everywhere where we are specifically checking if something is whitelisted (for example in lib/whitelisting.js and lib/filterComposer.js) we already call matcher.whitelist.matchesAny() directly. As long as this doesn't break (and I don't see why it would), we should be good there.

However, for the devtools panel (and other hit logger consumers, i.e. the issue reporter), we need the matching filter (even if it is a WhitelistFilter) when processing requests. So this code would become more complex, and will likely end up looking like that:

let blocked = matchesAnyBlockingFilter(...);

if (HitLogger.hasListener(tabId))
{
  let matcher = blocked ? matcher.blacklist : matcher.whitelist;
  let filter = matcher.matchesAny(...);
  logRequest(tabIds, request, filter);
}

if (blocked)
  return {cancel: true};

Also this will make the performance implications of activating the hit logger worse since it has to match the request twice. It might still be worth it if the performance improvement in the common case is significant enough. How about, you write a proof of concept, that shows up how much complexity this adds, and demonstrates the performance results.

comment:4 Changed 17 months ago by mjethani

There are two options:

  1. Modify the current implementation of matchesAny so that it returns a WhilelistFilter object only if the URL was prevented from being blocked (meaning the URL otherwise had a matching BlockingFilter). This would change the semantics, but should not affect the web extension. The implication here is that for URLs that aren't otherwise going to be blocked, any matching whitelist filters will be ignored and not shown in the DevTools panel. To me it seems logical: a whitelist filter is only truly a whitelist filter if it actually did anything; otherwise it's just waste (and probably a good indication that it should be removed). Also, only the first matching whitelist filter is returned, which means there's no guarantee that a whitelist filter you just wrote will be shown in the DevTools panel.
  1. Add an additional onlyIfBlocked option to matchesAny that follows the proposed algorithm if the value is true. The extension would pass false if DevTools is open and true otherwise.

There's also the option of keeping the current function as it is but adding a new isBlocked function that returns a boolean value, but this is just a variation of the second option above.

I personally think we should go with option one. It would be simple enough and there would be no downside to it for the extension. Yes, it does mean that there would two iterations instead of one for blocked URLs, but given that most URLs aren't blocked, on average we should be using less CPU than before.

Last edited 17 months ago by mjethani (previous) (diff)

comment:5 Changed 17 months ago by mjethani

  • Cc fhd added

comment:6 Changed 17 months ago by mjethani

  • Review URL(s) modified (diff)
Note: See TracTickets for help on using tickets.