Opened 7 months ago

Closed 4 months ago

Last modified 7 weeks ago

#7067 closed change (fixed)

Allow $rewrite to internal resources

Reported by: hfiguiere Assignee: hfiguiere
Priority: P2 Milestone:
Module: Core Keywords: circumvention
Cc: arthur, greiner, kzar, mjethani, sebastian, sdixit Blocked By:
Blocking: #7240 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: yes
Review URL(s):

https://codereview.adblockplus.org/29928557/

Description (last modified by mjethani)

Background

Instead of blocking URLs, we can redirect requests to internal resources built into Adblock Plus, which would have the same effect as blocking the URL but without triggering the onerror handlers (avoiding detection and circumvention).

Some of the use cases are as follows:

  • Serve a useless (no-op) JS script and not trigger the onerror handler
  • Neutralize tracking pixels (images) by serving a built-in alternative and never hitting the remote server
  • Serve an empty sound file to replace the one that is unwanted

To make this work, we could reuse the $rewrite filter option, but instead of rewriting to a different URL on the same origin it would rewrite to an internal resource specified using the abp-resource protocol identifier.

Specification

This $rewrite option still takes one parameter, which is the name of a "resource", prefixed with abp-resource:.

Its effect is that the URL rewrite gives the content of the "resource" instead. Resources are built into the WebExtension and cannot be third-party.

In this case the $rewrite option requires that the URL pattern contains a host (starts with || or *) and also that a domain is present. The domain may be specified using the $domain option or, if the URL pattern starts with ||, the $~third-party option.

Examples

For the following frame in an HTML document on example.com:

<iframe src="/foo/bar.html"></iframe>

The filter ||/foo/bar.html^$rewrite=abp-resource:blank-html,domain=example.com would serve the content data:text/html,<!DOCTYPE html><html><head></head><body></body></html> into the frame instead of fetching the actual content from the remote host.

What to change

Modify lib/filterClasses.js so that the $rewrite filter option can take a value starting with abp-resource: based on the above specification. The restriction about not allowing rewrites for requests that might load code should not apply to abp-resource: rewrites.

Add a prebuilt /data/resources.json file containing some initial resources. lib/filterClasses.js should load this file and serve resources from it.

Hints for testers

Suppose we have the following HTML document:

<!DOCTYPE html>
<html>
  <body>
    <iframe src="/foo/bar.html"></iframe>
  </body>
</html>

Now the filter ||/foo/bar.html^$rewrite=abp-resource:blank-html,~third-party should serve the HTML <!DOCTYPE html><html><head></head><body></body></html> into the frame. This should work no matter from which host the document is served.

If the filter is changed to */foo/bar.html^$rewrite=abp-resource:blank-html,~third-party (replace || with *), the filter should be considered invalid and the original document at /foo/bar.html on the same host should be served.

But if the filter is then changed to */foo/bar.html^$rewrite=abp-resource:blank-html,domain=example.com, the filter should work again (assuming the document is served from example.com).

If the * at the beginning is removed, the filter should be considered invalid and no longer work.

If the resource name is misspelt (e.g. blank-htm instead of blank-html), the filter should be considered valid and the filter hit should get logged in the DevTools panel, but the filter should still have no effect.

For the list of resources and their values, see the resources.json file in the adblockpluscore source repository.

Change History (59)

comment:1 Changed 7 months ago by hfiguiere

  • Owner set to hfiguiere

comment:2 Changed 7 months ago by arthur

  • Cc arthur added

comment:3 Changed 7 months ago by mjethani

Thinking about how this should work. I think we can set redirectUrl in the onBeforeRequest handler to a data: URI. We could keep the resources in a JSON file called resources.json, where the keys would be the name of the resource and the value would be the part of the URI following data: (e.g. image/png;base64,...). This could be in lib/content/resources.json, copied over into the extension similar to snippets.js. The extension can do the rest.

comment:4 Changed 7 months ago by greiner

  • Cc greiner added

comment:5 Changed 7 months ago by hfiguiere

  • Blocking 7072 added

comment:6 Changed 7 months ago by hfiguiere

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:7 Changed 7 months ago by hfiguiere

  • Cc kzar mjethani sebastian added

comment:8 Changed 7 months ago by sdixit

  • Cc sdixit added

comment:9 follow-up: Changed 7 months ago by mjethani

The $redirect option requires that the URL contains a host (|| or *) and also that a $domain option is present ($~third-party will do). Let's update the description with this information.

comment:10 Changed 7 months ago by hfiguiere

  • Description modified (diff)

comment:11 in reply to: ↑ 9 Changed 6 months ago by mjethani

Replying to mjethani:

The $redirect option requires that the URL contains a host (|| or *) and also that a $domain option is present ($~third-party will do). Let's update the description with this information.

To be clear, the $~third-party option is allowed as a substitute for $domain= only if the URL pattern contains a host name, i.e. ||example.com^$redirect=foo,~third-party is OK, but /bar^$redirect=foo,~third-party is not OK (however /bar^$redirect=foo,domain=example.com is OK).

Let's update the issue description with the exact behavior that we want to implement. It will also help during testing and further documentation.

@kzar any opinions on this?

(I'm not actually sure that this should be supported in EasyList, secondly I'm not sure that the resource names should be exactly as they are in uBlock Origin.)

comment:12 follow-up: Changed 6 months ago by mjethani

Regarding the resource names, I was thinking about having a more generic and flexible syntax.

How about $redirect=transparent (GIF, PNG, etc.) and $redirect=blank (HTML, JavaScript, CSS) and pick the right asset (actually just dynamically generate it) based on the request type? If you throw in a size for the images, like 1x1-transparent then it can generate an image of the right size. Most of the image would already be in memory, we would just replace the width and height values if given. Similarly, the "noop" HTML/JS/CSS would also already be in memory, there's nothing to customize as of now, but this kind of syntax allows the filter list author to specify any specifics in the future.

If this is going to be in EasyList, it should be properly designed with future compatibility in mind, uBlock Origin's syntax is a hack.

The problem with the current syntax is that it refers to assets directly. If a filter list author wants a 2x2 transparent GIF instead of a 1x1 transparent GIF, they have to wait for Adblock Plus to do a new release.

Last edited 6 months ago by mjethani (previous) (diff)

comment:13 in reply to: ↑ 12 Changed 6 months ago by kzar

Replying to mjethani:

...
@kzar any opinions on this?
...
If this is going to be in EasyList, it should be properly designed with future compatibility in mind, uBlock Origin's syntax is a hack.

I haven't been following this too closely to have much of an opinion on the details, but I agree with this sentiment. While I would like our syntax was compatible with uBlock, I'd rather have a well thought out syntax if I had to choose between that and compatibility.

comment:14 follow-up: Changed 6 months ago by hfiguiere

I like the idea of having simpler names, I'm not sure there is a trivial to generate these images. We can have either a hack-ish bitstream generation, or use existing native code libraries built with WASM. The former would be compact and limited, the later would be larger but more versatile.

Also there is the case of audio and video which are even more complicated, and necessary.

As for the implementation we can possibly do it in two stages:

  • static resources with a consistent naming,
  • and then, later, have dynamically generated assets.

comment:15 Changed 6 months ago by mjethani

Just brainstorming here, but what about the following syntax: The value of the $redirect option could include an arbitrary number of tokens, separated by a separator. Let's say the kinds of tokens you can include are: (1) resource type (HTML, JavaScript, GIF, etc.), (2) resource label ("transparent" for images, "blank" for HTML, JavaScript, etc.), (3) dimensions, etc. Let's also say, for example, that the separator is just the hyphen (-) character.

Then the value could be something like blank, gif-transparent, 32x32-transparent-image/png, and so on.

The tokens could appear in any order, so 32x32-transparent-png would be equivalent to transparent-32x32-png (essentially similar to CSS shorthands). If something is left out, it could be inferred from the resource type (i.e. if the resource type is script then blank would be equivalent to blank-javascript). For XMLHttpRequest, the resource type would have to be specified, in which case $redirect=blank would either have no effect or we would serve zero bytes with no MIME type.

comment:16 in reply to: ↑ 14 Changed 6 months ago by mjethani

Replying to hfiguiere:

I like the idea of having simpler names, I'm not sure there is a trivial to generate these images.

Since I have looked into the PNG format for the hide-if-contains-image snippet (#7088) and related filters, I know that we could have a template image, then we would simply replace the necessary fields in the image data. I'll have to check again, but I think it's possible to have a 1x1 transparent PNG and then simply replace 8 bytes (4 each for width and height) in the data to make it a 2x2 or any NxN image. It may be similar for single-color opaque images where one wants to modify the color (I think only the color palette would have to be updated, again probably only setting 3-4 bytes), so we might be able to support 1x1-black-png, 2x2-white-png, and so on, all with the same PNG template.

I have not looked into the GIF format yet.

The problem with including static resources is that we have to keep supporting them unless we're OK with breaking filters.

I actually don't think it's a bad idea to do this in two steps: first we do it with static resources, but only support it in the anti-circumvention filter list (similar to snippets (#6538)), then later we design it properly for EasyList.

comment:17 follow-up: Changed 6 months ago by hfiguiere

There is another issue: how do we deal with the $redirect to "neutered" scripts (Google Analytics is one of the target, but not the only one) which are basically JS snippets. Named resources are convenient for that. So we'd have to formalize that case in the syntax.

comment:18 Changed 4 months ago by hfiguiere

  • Description modified (diff)

comment:19 in reply to: ↑ 17 Changed 4 months ago by mjethani

Replying to hfiguiere:

There is another issue: how do we deal with the $redirect to "neutered" scripts (Google Analytics is one of the target, but not the only one) which are basically JS snippets. Named resources are convenient for that. So we'd have to formalize that case in the syntax.

By "neutered" scripts do you mean redirecting to a no-op JS?

comment:20 follow-up: Changed 4 months ago by hfiguiere

No. In that case it provides an API that does nothing, but that do no crash the client code. Hence, it is neutered. It does nothing, albeit that's not a blank JS.

Last edited 4 months ago by hfiguiere (previous) (diff)

comment:21 Changed 4 months ago by hfiguiere

  • Review URL(s) modified (diff)

comment:22 in reply to: ↑ 20 Changed 4 months ago by mjethani

Replying to hfiguiere:

No. In that case it provides an API that does nothing, but that do no crash the client code. Hence, it is neutered. It does nothing, albeit that's not a blank JS.

I see, thanks.

comment:23 follow-up: Changed 4 months ago by mjethani

How about we restrict this feature to our anti-circumvention filter list only for now?

This would allow us to:

  1. Flesh out the design/implementation over time
  2. Try it out in the wild and get feedback on how useful this feature is and how it could be improved (which would inform the design)

The current design/implementation seems OK to me for the initial iteration. We would have to think about forward compatibility right up front in case we want to make any changes to the syntax in the future, but that is the only challenge.

We could include any changes to the resources.json file in our definition of "snippets-only releases". Any such changes would most likely be combined with some changes to lib/content/snippets.js and included in the next such release.

Once we have agreement on this, we can proceed with the code review based on the current patch(es).

comment:24 in reply to: ↑ 23 Changed 4 months ago by hfiguiere

Replying to mjethani:

We could include any changes to the resources.json file in our definition of "snippets-only releases". Any such changes would most likely be combined with some changes to lib/content/snippets.js and included in the next such release.

I agree with this.

comment:25 Changed 4 months ago by sebastian

Fine with me to extend our definition of "snippet-only releases" to $redirect resources.

However, I don't see much of a case to limit this feature to our anti-circumvention filter list. Unless I miss something we don't have the same security concerns here, and this would be mostly to avoid any conflicts around breaking changes. If this is the only concern, I think it's better to declare this feature as experimental in the devbuild announcement, and leave it undocumented on our website. This is exactly what we did with element hiding emulation filters initially.

comment:26 Changed 4 months ago by mjethani

That would work, in my opinion, as long as it has the same effect: external filter list authors should be aware that if they use the feature their filters may stop working in the future without (much) notice.

We deprecated the old syntax for :-abp-properties() but it took us a whole year to remove support after the initial deprecation announcement. I was thinking we should be able to make changes and experiment further with each release without concern for any external filter lists.

comment:27 follow-up: Changed 4 months ago by sebastian

Whether available to external filter lists or not, there is a case for supporting deprecated filter syntax during a transition period. Otherwise, filters will only work (correctly) on either versions of Adblock Plus before or after the change, and it will take a while until everyone gets the update.

Other than that, I don't know what kept us from removing the legacy element hiding emulation filter syntax sooner. The filter lists that used it even though maintained by a third-party are generally capable of keeping up with changes in Adblock Plus, in particular when they use experimental features.

comment:28 Changed 4 months ago by sdixit

Fine with me to extend our definition of "snippet-only releases" to $redirect resources.

WFM too.

Also, what Sebastian suggested sounds good to me. Unless anyone else has a major objection to me, I think we could go ahead with this approach.

Last edited 4 months ago by sdixit (previous) (diff)

comment:29 in reply to: ↑ 27 ; follow-ups: Changed 4 months ago by mjethani

Replying to sebastian:

Whether available to external filter lists or not, there is a case for supporting deprecated filter syntax during a transition period.

Yes, but if we restrict it to our own filter list while the feature is still experimental, the transition period can be much shorter (we just have to withdraw any filters using the old syntax).

Anyway, if everyone thinks it's fine to support these filters in external filter lists, I'm fine with it.

I have only one suggestion: We should namespace the resource names. Instead of $redirect=blank-html, we could make it $redirect=abp-resource:blank-html or just $redirect=abp:blank-html, where abp-resource:blank-html is the URI to which to redirect the request. In the future if we want to change the syntax of the option, we won't have to think about name conflicts.

In fact, now that I think about it, do we really need a new $redirect option? $rewrite already rewrites the URL, if we simply rewrite it to abp:<resource-name> it should work the same way.

comment:30 in reply to: ↑ 29 Changed 4 months ago by mjethani

Replying to mjethani:

$rewrite already rewrites the URL, if we simply rewrite it to abp:<resource-name> it should work the same way.

This would also mean no changes in adblockpluschrome. It would work out of the box. The rule is simple: If the value starts with abp-resource:, rewrite the entire URL to the value of the resource. e.g. $rewrite=abp-resource:blank-html rewrites the entire URL to data:text/html,<!DOCTYPE html><html><head></head><body></body></html>.

comment:31 in reply to: ↑ 29 Changed 4 months ago by sebastian

Replying to mjethani:

In fact, now that I think about it, do we really need a new $redirect option? $rewrite already rewrites the URL, if we simply rewrite it to abp:<resource-name> it should work the same way.

For reference, I suggested that before, see 6592#comment:14. I still like that idea.

comment:32 Changed 4 months ago by mjethani

In that case, here's a mini specification:

  1. $rewrite can now take a URI of the form abp-resource:<name> where <name> is the name of the resource, as long as the pattern starts with || or * and has a domain specified
  2. A * at the end of the pattern is implied in this case, i.e. it rewrites the entire URL
  3. The URL to be rewritten must be http:// or https://
  4. The resource library is similar to the snippet library, i.e. we add/update resources as needed in minor/major releases

What do folks think?

Let's vote on this. If we agree, let's implement this for the next release.

comment:33 Changed 4 months ago by sebastian

I don't think we need to specify a protocol restriction, since Adblock Plus only intercepts HTTP(S) and WebSocket connections in the first.

Also we might want to relax the request type restrictions when redirecting to a resource. Redirecting to a blank placeholder is not any more of a security concern than blocking the request, while redirecting to an arbitrary path still is when loading code to be executed.

comment:34 Changed 4 months ago by mjethani

It makes sense to remove the request type restrictions if the value of the $rewrite option starts with abp-resource:.

Would it ever make sense to rewrite a ws(s):// URL to data:? Probably not. I think we can make it explicit that the URL must be http(s)://, this way there's no doubt about what works and what doesn't. Also, we only intercept HTTP(S) and WebSocket requests in the desktop extension, I'm not sure about other products (mobile).

comment:35 Changed 4 months ago by sebastian

Of course, a filter attempting to redirect a WebSocket connection would be useless. But I don't see any point in explicitly checking for this either.

I think it is safe to assume that only HTTP(S) and WebSocket requests are ever intercepted. Filter lists are written with (and tested against) that assumption. If all the filters in EasyList that don't contain a domain are suddenly matched against about: and chrome:// URLs, things will already break.

comment:36 Changed 4 months ago by mjethani

I don't know every single platform that adblockpluscore is used on and my reason for wanting an explicit restriction is really just to avoid unintended consequences.

comment:37 follow-ups: Changed 4 months ago by mjethani

Regarding make_resource.js: We want to commit all the assets to the repo and also the Base64 versions of the same data? That seems like unnecessary duplication. Why not just commit the assets and generate resources.json on the fly when the extension is loaded? Alternatively, why not just have resources.json in the repo (why do we need the individual asset files)?

comment:38 in reply to: ↑ 37 Changed 4 months ago by sebastian

Replying to mjethani:

I don't know every single platform that adblockpluscore is used on and my reason for wanting an explicit restriction is really just to avoid unintended consequences.

Then we should rather specify that the filter matcher must only be called with a valid http://, https://, ws:// or wss:// URL, otherwise things will already break. I don't see the point of restricting protocols specifically for $rewrite filters.

Replying to mjethani:

Regarding make_resource.js: We want to commit all the assets to the repo and also the Base64 versions of the same data? That seems like unnecessary duplication. Why not just commit the assets and generate resources.json on the fly when the extension is loaded?

At the moment, the code in adblockpluscore doesn't require any build steps. If we introduce any build steps, these would have to be integrated into the build scripts of each product that uses adblockpluscore as well as running them before running the tests in adblockpluscore itself. I'm not sure if this added complexity is worth it just to have those resources.

Alternatively, why not just have resources.json in the repo (why do we need the individual asset files)?

I guess, that might be an option.

comment:39 in reply to: ↑ 37 Changed 4 months ago by hfiguiere

Replying to mjethani:

Regarding make_resource.js: We want to commit all the assets to the repo and also the Base64 versions of the same data? That seems like unnecessary duplication. Why not just commit the assets and generate resources.json on the fly when the extension is loaded? Alternatively, why not just have resources.json in the repo (why do we need the individual asset files)?

It is not unecessary duplication. Assets are static, and they are like source code - you even asked how I generated them in the review. Given that we don't have a build process, I have chosen to commit the result to make it simple for everyone. I'd have this part of a build sequence if we had one.

Doing what make_resource.js at the extension startup would be excessive.
We want the individual assets in the repo so that:

  • we can update them if needed
  • we can regenerate the resource.json at any time
  • if we even chose a different delivery, we can.

Said assets are not delivered to the final extension, only resources.json, so for the user deliverable, it doesn't even matter.

I'll rework the patch implement this feature as part of $rewrite

comment:40 Changed 4 months ago by sebastian

I tend to agree with Manish, your script is just inlining binary resources, everything else is just copied over from resources/index.json, and binary files are not source code. It effectively doesn't make much of a difference whether to keep those binary files as separate files in the repository or whether to copy and paste their base64-encoded data into data/resources.json.

Last edited 4 months ago by sebastian (previous) (diff)

comment:41 follow-up: Changed 4 months ago by mjethani

The individual files here are not like source code, because the conversion from source code to binary is (1) non-deterministic and (2) lossy. In other words, the process cannot be reversed. The source code is therefore the canonical version.

Over here, the process can be reversed one hundred percent; anyone who so wishes can reproduce the binaries from the Base64 versions:

echo -n 'R0lGODlhAQABAIABAAAAAP///yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' | \
  base64 -d > 1x1-transparent.gif

Also, source code is updated frequently, I don't expect an asset to change once it has been committed to resources.json.

Anyway, let's file a new issue for make_resource.js, since we are still discussing it.

comment:42 Changed 4 months ago by hfiguiere

  • Blocking 7072 removed
  • Description modified (diff)
  • Summary changed from Implement $redirect filter option to Allow $rewrite to internal resources.

comment:43 Changed 4 months ago by hfiguiere

  • Blocking 7240 added

comment:44 in reply to: ↑ 41 Changed 4 months ago by hfiguiere

Replying to mjethani:

Anyway, let's file a new issue for make_resource.js, since we are still discussing it.

Filed issue #7240 for this.

comment:45 Changed 4 months ago by mjethani

Question: What if one wants to rewrite to literally abp-resource:foo?

As in /foo\/([a-z]*)/$rewrite=abp-resource:foo/$1, so that the URL http://example.com/foo/bar/ is rewritten to http://example.com/abp-resource:foo/bar/

There are three options:

  1. Not supported.
  2. Supported if pattern does not begin with || or *. This would be logically consistent with the current behavior, because rewriting the entire URL literally to abp-resource:foo would cause it to become of a different origin, which is invalid.
  3. Supported via a hack, similar to how patterns surrounded by / must also have a * (#7208).
Last edited 4 months ago by mjethani (previous) (diff)

comment:46 Changed 4 months ago by abpbot

A commit referencing this issue has landed:
Issue 7067 - Allow $rewrite to internal resources

comment:47 Changed 4 months ago by hfiguiere

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:48 Changed 4 months ago by mjethani

  • Description modified (diff)
  • Review URL(s) modified (diff)

comment:49 Changed 4 months ago by mjethani

  • Summary changed from Allow $rewrite to internal resources. to Allow $rewrite to internal resources

comment:50 Changed 4 months ago by mjethani

  • Description modified (diff)

comment:51 Changed 4 months ago by mjethani

  • Description modified (diff)

comment:52 Changed 4 months ago by mjethani

  • Priority changed from Unknown to P2
  • Ready set
  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:53 Changed 4 months ago by mjethani

  • Description modified (diff)

comment:54 Changed 4 months ago by mjethani

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from reopened to closed

comment:55 Changed 3 months ago by abpbot

A commit referencing this issue has landed:
Issue 7067 - Allow $rewrite to internal resources

comment:56 Changed 3 months ago by arthur

  • Sensitive unset

comment:57 Changed 2 months ago by rscott

  • Verified working set

comment:58 Changed 7 weeks ago by SMed79

Rewrite filter to internal resource seems not working with existing blocking filter.

e.g. Videos at voici.fr are broken because the EP filter ||d1z2jf7jlzjs58.cloudfront.net^ 0bad9308
If you disable easyprivacy then add ||d1z2jf7jlzjs58.cloudfront.net/p.js$rewrite=abp-resource:blank-js,domain=voici.fr (easylist#3179) the tracking script is rewritten to internal resource and the videos play without any issue.

Does we need $important filter option here?

Last edited 7 weeks ago by SMed79 (previous) (diff)

comment:59 Changed 7 weeks ago by mjethani

ABP applies only one matching filter to a URL. In this case it seems the EasyPrivacy filter wins. If the EP filter was written as ||d1z2jf7jlzjs58.cloudfront.net^$domain=~voici.fr, it would be skipped on voici.fr.

To give some background, because ||d1z2jf7jlzjs58.cloudfront.net^ is a simpler filter, it gets priority in terms of filter matching. Maybe we should give priority to domain-specific filters instead, but I would have to check how that affects performance.

The $important option is an interesting suggestion, but ABP does not support it as of this writing. We also have an issue about prioritizing user-defined filters (#7248).

Note: See TracTickets for help on using tickets.