Opened 2 months ago

Closed 8 weeks ago

Last modified 8 weeks ago

#7033 closed defect (rejected)

Meta data in filter list header ignored

Reported by: greiner Assignee:
Priority: P1 Milestone:
Module: Core Keywords:
Cc: sebastian, mjethani, kzar, kvas, rhowell, sporz, arthur, amrmak Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by greiner)

Environment

Adblock Plus development build

How to reproduce

  1. Force EasyList filter list update

Observed behaviour

No "expires" filter list header found. Filter list expiration interval is set to default one.

Expected behaviour

Given "expires" filter list header found and used by the extension to deteermine expiration interval.

Further information

This is a regression that was introduced in #6923. See also hub#16173 for the corresponding internal ticket.

For reference, here are the first few lines in EasyList Germany in which the "expires" header is preceded by a regular comment.

[Adblock Plus 2.0]
! Checksum: 6vc5746PdpjYdDBSmuxOaQ
! Version: 201810101631
! Title: EasyList Germany+EasyList
! EasyList Germany and EasyList combination subscription
! Last modified: 10 Oct 2018 16:31 UTC
! Expires: 1 days (update frequency)
! Homepage: https://easylist.to/

The corresponding check can be found in adblockpluscore/lib/synchronizer.js.

Suggested solution

  • Add checks to authoring or build tools filter list authors use so that they can get notified of special comments that are being ignored.
  • Allow regular comments and empty lines in between special comments at the top of the file.

Change History (21)

comment:1 Changed 2 months ago by greiner

  • Description modified (diff)

comment:2 Changed 2 months ago by greiner

  • Component changed from Platform to Core
  • Description modified (diff)

comment:3 in reply to: ↑ description ; follow-up: Changed 2 months ago by mjethani

Replying to greiner:

  • Add checks to authoring or build tools filter list authors use so that they can get notified of special comments that are being ignored.
  • Allow regular comments and empty lines in between special comments at the top of the file.

We announced this change 23 days ago, I don't think we are going back unless there is a compelling reason. It will likely have to be the former.

comment:4 Changed 2 months ago by mjethani

  • Priority changed from Unknown to P1

comment:5 Changed 2 months ago by mjethani

  • Cc kzar added

comment:6 in reply to: ↑ 3 Changed 2 months ago by greiner

Replying to mjethani:

We announced this change 23 days ago, I don't think we are going back unless there is a compelling reason. It will likely have to be the former.

I'm not suggesting to revert the change but merely to allow comments (and maybe also empty lines) to be placed between special comments to minimize the risk of running into this problem. The underlying reasoning being that it's easier to differentiate a comment from a filter than a special comment from a regular comment.

We can still exit the loop as soon as the first blocking or hiding filter is encountered, of course.

comment:7 follow-up: Changed 2 months ago by mjethani

  • Cc kvas rhowell sporz added

The exact behavior chosen in #6923, after much discussion, was inspired by the requirements in #6920, which in turn was a result of the work in #6685 (incremental filter list downloads). We would have to reopen that discussion if it is infeasible for filter lists to update their templates in time in response to this change.

comment:8 follow-up: Changed 2 months ago by mjethani

@kvas how would it affect python-abp if we changed the logic to parse special comments even after any non-special comments or blank lines?

comment:9 in reply to: ↑ 8 ; follow-up: Changed 2 months ago by kvas

Replying to mjethani:

@kvas how would it affect python-abp if we changed the logic to parse special comments even after any non-special comments or blank lines?

We should synchronize regarding the roll out of this change, but otherwise there should be no problem. According to your current plan, at what point would you stop recognizing metadata comments? After encountering the first filter or the first blank line?

comment:10 in reply to: ↑ 9 ; follow-up: Changed 2 months ago by mjethani

Replying to kvas:

According to your current plan, at what point would you stop recognizing metadata comments? After encountering the first filter or the first blank line?

This change in core (#6923) was mainly done to make it consistent with python-abp. Core could continue with the old logic and still not interpret "Warning: ..." or "Note: ..." as metadata, because there is a whitelist. I'm not sure why the whitelist was removed from python-abp.

What do you think about stopping at any non-comment?

In the following example, Expires would be interpreted as metadata:

[Adblock Plus 2.0]
! Checksum: 6vc5746PdpjYdDBSmuxOaQ
! Version: 201810101631
! Title: EasyList Germany+EasyList
! EasyList Germany and EasyList combination subscription
! Last modified: 10 Oct 2018 16:31 UTC
! Expires: 1 days (update frequency)
! Homepage: https://easylist.to/

In this case too:

[Adblock Plus 2.0]
! Checksum: 6vc5746PdpjYdDBSmuxOaQ
! Version: 201810101631
! Title: EasyList Germany+EasyList
!
! EasyList Germany and EasyList combination subscription
!
! Last modified: 10 Oct 2018 16:31 UTC
! Expires: 1 days (update frequency)
! Homepage: https://easylist.to/

But not in this case:

[Adblock Plus 2.0]
! Checksum: 6vc5746PdpjYdDBSmuxOaQ
! Version: 201810101631
! Title: EasyList Germany+EasyList

! EasyList Germany and EasyList combination subscription

! Last modified: 10 Oct 2018 16:31 UTC
! Expires: 1 days (update frequency)
! Homepage: https://easylist.to/

This would seem to be more backwards compatible. It is also easier to explain to someone, in my opinion (the first block of comments can contain metadata).

comment:11 in reply to: ↑ 10 Changed 2 months ago by kvas

Replying to mjethani:

This change in core (#6923) was mainly done to make it consistent with python-abp. Core could continue with the old logic and still not interpret "Warning: ..." or "Note: ..." as metadata, because there is a whitelist. I'm not sure why the whitelist was removed from python-abp.

Sebastian has proposed to remove it to avoid the synchronization issue between the two whitelists (the one in Core and the one in Python-abp). I think this makes things easier and it's definitely better if we can somehow do it without the whitelist in Python-abp (I guess at least an implicit whitelist in Core is unavoidable because Core will anyway understand some metadata comments and not others).

What do you think about stopping at any non-comment?

Seems reasonable, especially if we document this behavior somewhere, like in the documentation about authoring filter lists.

comment:12 Changed 2 months ago by arthur

  • Cc arthur added

comment:13 Changed 2 months ago by amrmak

  • Cc amrmak added

comment:14 in reply to: ↑ 7 Changed 2 months ago by mjethani

Replying to mjethani:

The exact behavior chosen in #6923, after much discussion, was inspired by the requirements in #6920, which in turn was a result of the work in #6685 (incremental filter list downloads). We would have to reopen that discussion if it is infeasible for filter lists to update their templates in time in response to this change.

I forgot to include #6950. There were two parts to the change in #6923, the second part was the result of changes for #6950 in python-abp. I can't really tell where the discussion happened to take it from "any non-empty comment" to "any line that parses as metadata".

Anyway, the regular expression in the first patch was /^\s*!\s*(?:(\w+)\s*:\s*(.*)|\S)/. The second patch changed it to /^\s*!\s*(.*?)\s*:\s*(.*)/. We could just change it to /^\s*!\s*(?:(.*?)\s*:\s*(.*))?/ now to allow even blank comments.

Let's wait for Sebastian's input on this.

Last edited 2 months ago by mjethani (previous) (diff)

comment:15 Changed 8 weeks ago by sebastian

There are two aspects to these changes:

  1. Performance of parsing filter lists in Adblock Plus.
  2. Discoverability of metadata (without having to maintain a hard-coded list of supported metadata in python-abp, and keeping it in sync with what is implemented in Adblock Plus).

For the performance it might not make too much of a difference whether we stop parsing metadata at the first line not starting with ! or at the first line that isn't in the form of a metadata comment. However, a more relaxed rule makes the discoverability of metadata less accurate. For example, look at EasyList:

[Adblock Plus 2.0]
! Version: 201810181717
! Title: EasyList
! Last modified: 18 Oct 2018 17:17 UTC
! Expires: 4 days (update frequency)
! Homepage: https://easylist.to/
! Licence: https://easylist.to/pages/licence.html
! 
! Please report any unblocked adverts or problems
! in the forums (https://forums.lanik.us/)
! or via e-mail (easylist.subscription@gmail.com).
! GitHub issues: https://github.com/easylist/easylist/issues
! GitHub pull requests: https://github.com/easylist/easylist/pulls
! 
! -----------------------General advert blocking filters-----------------------!
! *** easylist:easylist/easylist_general_block.txt ***

If we relax the rule as you suggest, then python-abp would incorrectly recognize the following lines as metadata:

! in the forums (https://forums.lanik.us/)
! GitHub issues: https://github.com/easylist/easylist/issues
! GitHub pull requests: https://github.com/easylist/easylist/pulls
! *** easylist:easylist/easylist_general_block.txt ***

OTOH, there doesn't seem to be any good reason why in the example in the issue description, there should be a comment in the middle of the metadata. So I would rather change the way the bundled list is generated to comply with the new requirements.

Last edited 8 weeks ago by sebastian (previous) (diff)

comment:16 Changed 8 weeks ago by mjethani

  • Resolution set to rejected
  • Status changed from new to closed

I'm closing this as rejected for now. Feel free to reopen it if required. I will also leave a comment on the Hub ticket.

comment:17 Changed 8 weeks ago by greiner

I'm not sure I understand why it has been rejected. The underlying issue still remains, regardless of whether or not EasyList makes the necessary changes.

comment:18 Changed 8 weeks ago by sebastian

I agree with Manish to not address this issue in Adblock Plus. But that means it would have to be addressed on the backend site. So we should either reopen that issue and assign it to the corresponding module, or file a new issue.

comment:19 follow-up: Changed 8 weeks ago by sebastian

After talking to Vasily, it turned out that rather than in the backend scripts, this would have to be addressed in the input files for the combination filter lists. For example in easylistgermany+easylist.txt, line 3 needs to be removed or moved below any metadata (similar for other combination lists). Arthur, would you mind?

comment:20 in reply to: ↑ 19 Changed 8 weeks ago by arthur

Replying to sebastian:

Arthur, would you mind?

Sure, I'm going to update all affected filter list templates.

Note: See TracTickets for help on using tickets.