Opened on 09/02/2018 at 02:32:33 PM

Closed on 09/10/2018 at 06:53:30 PM

Last modified on 09/11/2019 at 10:14:46 AM

#6920 closed defect (fixed)

[python-abp] Some comments are inaccurately recognized as metadata

Reported by: sebastian Assignee: sebastian
Priority: Unknown Milestone:
Module: Sitescripts Keywords:
Cc: kvas, rhowell Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29873561

Description (last modified by sebastian)

Background

With #6685, we removed the whitelist and started to recognize any comment as metadata that matches the regular expression !\s*([\w-]+)\s*:(?!//)\s*(.*). That way we don't need to maintain a set of metadata keys supported by Adblock Plus in python-abp which would have been great.

However, if I now filter all lines in EasyList by that regular expression, I get following false positive:

! WARNING: Sites listed below can be harmful. Please, do not visit them if you don't know what you're doing!

While I couldn't find another example at the moment, I could also imagine filter lists using comments
like ! Note: ... or ! FIXME: ... (which are kinda common in source code).

What to change

Parse metadata in python-abp consistent with the new semantics in Adblock Plus as of #6923.

Attachments (0)

Change History (9)

comment:1 Changed on 09/02/2018 at 02:34:47 PM by sebastian

We might want to avoid any fuzzy logic that will require as much effort to maintain as a whitelist would (e.g. a blacklist). Perhaps a whitelist wasn't the worst idea after all. But perhaps one generic approach that could work is to only consider a comment as metadata if it's at the top of the file. We can implement the same semantics in Adblock Plus in order to guarantee consistent results. However, we'd need to make sure to still strip checksums regardless of their position in the file for backwards-compatibility. What do you think?

Last edited on 09/02/2018 at 02:36:18 PM by sebastian

comment:2 Changed on 09/03/2018 at 12:37:33 PM by kvas

If existing filter lists have all metadata at the top, we could consider this part of the spec and adjust ABP and python-abp accordingly. Sounds like a simple way to take care of this and I can't think of any reasons to have metadata anywhere but at the top of the file, apart from the old checksum case, for which we will have special logic.

comment:3 Changed on 09/03/2018 at 03:41:33 PM by sebastian

  • Owner set to sebastian

comment:4 Changed on 09/03/2018 at 05:58:18 PM by sebastian

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

I had a go. Let me know what you think. If you agree to the general approach already, I go ahead and prepare the changes for Adblock Plus itself. Those will be trivial, and as a side effect make parsing metadata more efficient, since we can bail early and don't have to check every single line.

Last edited on 09/03/2018 at 06:25:53 PM by sebastian

comment:5 Changed on 09/03/2018 at 06:47:45 PM by sebastian

There we go: #6923

comment:6 Changed on 09/05/2018 at 01:12:14 PM by sebastian

  • Description modified (diff)

comment:7 Changed on 09/05/2018 at 01:12:38 PM by sebastian

  • Type changed from change to defect

comment:8 Changed on 09/10/2018 at 06:52:30 PM by abpbot

A commit referencing this issue has landed:
Issue 6920 - Only parse metadata from the top of the file

comment:9 Changed on 09/10/2018 at 06:53:30 PM by sebastian

  • Resolution set to fixed
  • Status changed from reviewing to closed

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from sebastian.
 
Note: See TracTickets for help on using tickets.