Opened 2 months ago

Last modified 6 weeks ago

#7337 reopened change

Create :-abp-contains-visible() advanced selector

Reported by: hfiguiere Assignee: hfiguiere
Priority: Unknown Milestone:
Module: Core Keywords: circumvention
Cc: arthur, mapx, amr, mjethani, sebastian, kzar, fhd Blocked By: #7450
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by mjethani)

Background

We need to check for the visibility of text content as some websites have hidden text, that is still present in Node.textContent.

e.g.

<style>
  .c978y4 {
    display: none;
  }
</style>
<div id="foo">
  <span>S</span><span class="c978y4">A</span><span>ponso</span><span class="c978y4">x</span><span>red</span>
</div>

In the above example, foo.textContent.trim() returns "SAponsoxred". We need a way to strip out the portions of the text that are not visible to the user.

What to change

  • in lib/content/elemHideEmulation.js add an implementation of ContainsSelector that will check the text visibility.
  • this will use the :-abp-contains-visible() pseudo selector syntax.

Text visibility is done with CSS. This include 0 font-size, transparent (opacity = 0), same foreground and background colour, visibility and display. We should review what are the other possibilities.

Notes

I don't think this needs to be a snippet.

Hint for testers

TBD

Change History (34)

comment:1 Changed 2 months ago by hfiguiere

  • Owner set to hfiguiere

comment:2 Changed 2 months ago by hfiguiere

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:3 Changed 2 months ago by hfiguiere

  • Description modified (diff)

comment:4 follow-up: Changed 2 months ago by agiammarchi

Just thinking out loudly, negative z-index, absolute/fixed position, transform3d, are also a way to make some text "disappear". Since they play with classes too, a font family that contains only zero spaced chars could work too.

visibility is, on the other hand, the only one that keeps the rendered size so it might be worth checking that, but it's not more interesting than others techniques.

comment:5 in reply to: ↑ 4 Changed 2 months ago by hfiguiere

Replying to agiammarchi:

Just thinking out loudly, negative z-index,

Negative z-index doesn't necessarily hide, so the heuristics are a bit more complex.

absolute/fixed position, transform3d, are also a way to make some text "disappear". Since they play with classes too, a font family that contains only zero spaced chars could work too.

Maybe we can consider that if one of the element dimension is 0 it is not visible. Or a negative position.

visibility is, on the other hand, the only one that keeps the rendered size so it might be worth checking that, but it's not more interesting than others techniques.

visibility is already handled. Note that innerText vs textContent already handle that difference (albeit we use the later in :-abp-contains)

comment:6 Changed 2 months ago by arthur

  • Cc arthur added

comment:7 Changed 2 months ago by mapx

  • Cc mapx added

comment:8 Changed 2 months ago by amrmak

  • Cc amr added

comment:9 Changed 2 months ago by mjethani

While I like the idea behind this, I am not too sure of the syntax. Maybe we don't need a new pseudoclass but can handle it with the same pseudoclass but with some enhancements to the syntax within the parentheses.

e.g.

div:-abp-contains(='Hello world')

Searches for visible text "Hello world".

div:-abp-contains(!'Hello world')

Searches for hidden text "Hello world".

div:-abp-contains('Hello world')

Searches for text "Hello world" regardless of visibility.

I'm just thinking out loud.

comment:10 follow-up: Changed 2 months ago by hfiguiere

I don't think I agree with that syntax. It is less clear for the filter author, it is more complicated for us to implement. What's the problem of having a new pseudo class?

comment:11 in reply to: ↑ 10 ; follow-up: Changed 8 weeks ago by mjethani

Replying to hfiguiere:

I don't think I agree with that syntax. It is less clear for the filter author, it is more complicated for us to implement. What's the problem of having a new pseudo class?

Currently we have :-abp-properties(), :-abp-has(), and :-abp-contains(). Each of these are very different in what they do. It's like having three separate functions. What you're proposing is another function that is only different from :-abp-contains() in one way, which is that it looks for visible text only. I consider that more as an additional parameter to the function rather than as a separate function. In the implementation itself, it reuses all of the code of :-abp-contains() except for the part where it looks for whether the text is visible.

Now there's no way to pass additional parameters to :-abp-contains() but then we should work on that, because there will be other things we may want to allow filter list authors to specify.

How about we start by implementing this as a snippet instead? Snippet filters already accept parameters, the hide-if-contains snippet could simply take an additional parameter to specify whether the text should be visible. If we find this being used a lot in production, then we work on a better syntax. After all that was one of the points of doing the snippets feature.

PS: Sorry about the late response, I somehow missed your comment.

Last edited 8 weeks ago by mjethani (previous) (diff)

comment:12 in reply to: ↑ 11 Changed 8 weeks ago by mjethani

Replying to mjethani:

How about we start by implementing this as a snippet instead?

The other advantage of doing this as a snippet change is that it would get out quickly in the next minor release.

comment:13 Changed 8 weeks ago by hfiguiere

if it's a snippet that mean it's restricted to the anti-cv list. I don't see why this should be the case.

comment:14 Changed 8 weeks ago by hfiguiere

also as a snippet it will be much harder to use.

comment:15 Changed 8 weeks ago by mjethani

  • Cc mjethani sebastian kzar added
  • Component changed from Unknown to Core

comment:16 Changed 8 weeks ago by mjethani

  • Description modified (diff)

comment:17 Changed 8 weeks ago by mjethani

Well if it's not a snippet, and if we want to introduce a new ABP-specific pseudo-class like :-abp-contains-visible(), then this needs to go through the same kind of review that the other pseudo-classes had to go through.

My concern really is that we're going to end up with :-abp-contains-visible(), :-abp-contains-hidden(), :-abp-contains-bold(), :-abp-contains-italic(), :-abp-contains-black(), :-abp-contains-white() ... there's no end to it. If we want to be able to select an element based on the style properties of the text, we need to think it through. There's nothing wrong with making this part of element hiding emulation but I don't want us to end up with bad APIs that we end up having to support forever, especially if it is only because one major website is using a particular circumvention technique.

The snippets feature was designed specifically so we could experiment with ideas that we can later formalize into the core ad blocker.

I would still suggest that we start by adding a parameter to the hide-if-contains snippet, which would solve the problem for now, and continue working on this idea on the side.

comment:18 Changed 8 weeks ago by mjethani

Another option is to allow a list of flags after the text:

example.com#?#:-abp-contains('sponsored', visible, case-insensitive, bold)

The flags could be specified in any order, only that they must be separated by commas. I'm not even sure we need the commas actually (CSS doesn't use commas, e.g. border: 1px solid black).

example.com#?#:-abp-contains('sponsored' visible case-insensitive bold)

This would have one unfortunate effect, which is that the text would have to be parsed using JSON.parse() and any quotes in the text would have to be escaped by the filter list author.

Also, I would suggest renaming visible to legible: the former has a specific meaning in HTML/CSS, the latter is our approximation of whether the user can read the text (based on the current state of supported browsers).

comment:19 Changed 7 weeks ago by hfiguiere

:-abp-contains-visible() is the only one we want here. No need to be over complicated. If you want to check colours or font style or weight, just use :-abp-properties(). If you want case insensitive, use regexp like we already support.

I think visible is the right word.

comment:20 Changed 7 weeks ago by mjethani

How would one use :-abp-properties() in combination with :-abp-contains() to select an element based on only those parts of the text that are hidden, of a certain font size, of a certain color, and so on?

Based on what I have seen in the circumvention game, some or all of these may be required at some point.

comment:21 Changed 7 weeks ago by mjethani

I would also like to mention something that I had suggested before: uBlock Origin has a pseudo-class called :matches-css(). We could implement something similar called :-abp-style() that selects an element based on the result of getComputedStyle(). We already do this in a snippet, so clearly there is a need for it.

If you can find a way to combine :-abp-contains() with :-abp-style() that addresses this use case, that would be interesting.

Last edited 7 weeks ago by mjethani (previous) (diff)

comment:22 follow-up: Changed 7 weeks ago by hfiguiere

I'm not against :-abp-style() but I don't think it would solve the problem here since we'd have to guess the structure of the placeholders for that invisible text.

comment:23 Changed 7 weeks ago by hfiguiere

I confirm what I wrote above. One of the target for this changed their DOM for the ad label and the filter still works.

comment:24 in reply to: ↑ 22 ; follow-up: Changed 7 weeks ago by mjethani

Replying to hfiguiere:

I'm not against :-abp-style() but I don't think it would solve the problem here since we'd have to guess the structure of the placeholders for that invisible text.

Nor would :-abp-properties() solve the problem (e.g. if you want to select an element based on hidden text or bold text), which is what I said earlier. This means sooner or later somebody will come along and ask for a new :-abp-contains-hidden() or a :-abp-contains-bold() pseudoclass. There are so many such possible pseudoclasses we could implement, but they would be mere variations of :-abp-contains(). I just want to make sure we have the right syntax for it so we don't have to modify the code for each possible case.

In order to learn what possibilities there are in the wild, we use snippets, which have several advantages:

  1. Relatively lower standards for quality (minimal design and fast code review)
  2. Faster to ship
  3. Shorter shelf life: A snippet can be discarded if no longer useful
  4. Not affected by shadow DOM and similar hacks, because they start running immediately and have no artificial delays (see #5650 and #6813)

If the goal is to solve the problem immediately for one particular case of circumvention, the way to do it is using a snippet. I like the idea of baking this into our core feature set but that needs proper design of the syntax and the semantics.

Last edited 7 weeks ago by mjethani (previous) (diff)

comment:25 in reply to: ↑ 24 ; follow-up: Changed 7 weeks ago by hfiguiere

Replying to mjethani:

Nor would :-abp-properties() solve the problem (e.g. if you want to select an element based on hidden text or bold text), which is what I said earlier. This means sooner or later somebody will come along and ask for a new :-abp-contains-hidden() or a :-abp-contains-bold() pseudoclass. There are so many such possible pseudoclasses we could implement, but they would be mere variations of :-abp-contains(). I just want to make sure we have the right syntax for it so we don't have to modify the code for each possible case.

I didn't say :-abp-properties() would help in that case either.
:-abp-contains-visible() is just a variation of :-abp-contains(). It is meant for a specific technique where the user see something that is different than what the DOM has.
There is absolutely no advantage to make it a flag argument for :-abp-contains(), and it would make it less expressive.

In order to learn what possibilities there are in the wild, we use snippets, which have several advantages:

  1. Relatively lower standards for quality (minimal design and fast code review)
  2. Faster to ship
  3. Shorter shelf life: A snippet can be discarded if no longer useful

In that case discarding it would mean the technique could be brought back to circmvent. As I already said above, my filter using :-abp-contains-visible() still work as expected while with the current limitations of our filter syntax, filters needs to be updated.

  1. Not affected by shadow DOM and similar hacks, because they start running immediately and have no artificial delays (see #5650 and #6813)

If the goal is to solve the problem immediately for one particular case of circumvention, the way to do it is using a snippet. I like the idea of baking this into our core feature set but that needs proper design of the syntax and the semantics.

When that one case is important and use resources because the tools are not good enough, it's worth it. It also meant to prevent the workaround from spreading.

comment:26 in reply to: ↑ 25 Changed 6 weeks ago by mjethani

Replying to hfiguiere:

Replying to mjethani:

Nor would :-abp-properties() solve the problem (e.g. if you want to select an element based on hidden text or bold text), which is what I said earlier. This means sooner or later somebody will come along and ask for a new :-abp-contains-hidden() or a :-abp-contains-bold() pseudoclass. There are so many such possible pseudoclasses we could implement, but they would be mere variations of :-abp-contains(). I just want to make sure we have the right syntax for it so we don't have to modify the code for each possible case.

I didn't say :-abp-properties() would help in that case either.

Replying to hfiguiere:

:-abp-contains-visible() is the only one we want here. No need to be over complicated. If you want to check colours or font style or weight, just use :-abp-properties().

I took it that you meant :-abp-contains() in combination with :-abp-properties() could be used to select an element based on the display properties of the text.

Replying to hfiguiere:

Replying to mjethani:

:-abp-contains-visible() is just a variation of :-abp-contains(). It is meant for a specific technique where the user see something that is different than what the DOM has.
There is absolutely no advantage to make it a flag argument for :-abp-contains(), and it would make it less expressive.

So you are sure that :-abp-contains-visible() is the only variation we are ever going to need? And every time the technique for making text invisible is updated, we have to update our code and do a new release?

Why not:

:-abp-contains('Sponsored', {display: inline, visibility: visible, opacity: 1, ...})

(This is just an example for the sake of argument, ignore the syntax.)

When the technique for making text invisible is updated, we just update the filters.

In order to learn what possibilities there are in the wild, we use snippets, which have several advantages:

  1. Relatively lower standards for quality (minimal design and fast code review)
  2. Faster to ship
  3. Shorter shelf life: A snippet can be discarded if no longer useful

In that case discarding it would mean the technique could be brought back to circmvent. As I already said above, my filter using :-abp-contains-visible() still work as expected while with the current limitations of our filter syntax, filters needs to be updated.

I did not mean we will discard it (we have not discarded a single useful snippet so far), what I meant is that if we want to then it can be done without any fuss. Every new snippet is experimental by default and it still goes into production and solves the problem at hand, without much discussion about it.

What you're saying is that there is a specific circumvention technique that we can use a snippet to address but for some reason we should update our core feature set to address it instead. Why? What is the argument in favor of adding a core feature to address a specific circumvention technique?

Core features can only be updated in major releases, like I said, and whatever method you come up with to detect visible text will be circumvented the very same day. This is why we have snippets in the first place, so we can update our techniques quickly.

comment:27 Changed 6 weeks ago by mjethani

To give an example of where we used a snippet where a core feature did not solve the problem, see the strip-fetch-query-parameter snippet (#7294). We tried to use $rewrite, but preflight OPTIONS requests cannot be redirected without causing the request to fail entirely (as per the standard).

Rather than insisting that we "fix" $rewrite (or, much worse, come up with a $rewrite-preflight and clutter the API even further for questionable value), we just wrote a snippet. It solves the problem. If we find this to be a common case, we can think about how to address it in $rewrite over the longer term. Whatever solution we come up with, it will take a long time to come into production. At this point it is not worth anyone's time to try to address this case in $rewrite.

In general snippets are a sandbox for testing out new ideas. If an idea has "proved" itself in the wild, we think about formalizing it into a core feature.

comment:28 Changed 6 weeks ago by hfiguiere

  • Cc fhd added

comment:29 Changed 6 weeks ago by fhd

How I understood it, this is a bit stuck, so I'll add my two cents.

We try to keep the filter syntax somewhat stable, so I think it's worth investing some extra thought into the design. I'd consider it good design if we have simple building blocks that can be combined in flexible and powerful ways. So I like the idea of adding arbitrary properties to check for to :-abp-contains. I don't like the idea of a single purpose variation of it very much, even if we manage to think of all the ways in which an element could be made invisible upfront.

For now, I'd suggest to go with a snippet, unless I'm missing some downside here. For one, I think it's the best way to experiment with things that _could_ become features, and secondly, it's just a whole lot faster in my book:

Type of changeLead time
Feature8-15 weeks (2-3 weeks of development/review, 6-12 weeks to get into the next major release)
Snippet2-8 weeks (1 week of development/review, 1-7 weeks to get into the next fix/snippets-only release)

comment:30 Changed 6 weeks ago by hfiguiere

Reworked the patch as a snippet. IMHO it is not as flexible from a filter authoring PoV.

If you are ok with that approach, I'll update the trac to match the implementation.

comment:31 Changed 6 weeks ago by mjethani

@fhd thanks for your thoughts.

@hfiguiere let's keep this ticket about the feature that you proposed, which, if we get around to it, should be based on what we learn from the snippet's usage in practice. Its design should be informed by the lessons we learn from the snippet. We should open a new ticket for the snippet, in my opinion, while continuing the discussion about the core feature here.

comment:32 Changed 6 weeks ago by hfiguiere

  • Blocked By 7450 added

comment:33 Changed 6 weeks ago by hfiguiere

Filed issue #7450

comment:34 Changed 6 weeks ago by hfiguiere

  • Review URL(s) modified (diff)
  • Status changed from reviewing to reopened
Note: See TracTickets for help on using tickets.