Opened 2 years ago

Closed 2 years ago

Last modified 13 months ago

#6920 closed defect (fixed)

[python-abp] Some comments are inaccurately recognized as metadata

Reported by: sebastian Assignee: sebastian
Priority: Unknown Milestone:
Module: Sitescripts Keywords:
Cc: kvas, rhowell Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by sebastian)


With #6685, we removed the whitelist and started to recognize any comment as metadata that matches the regular expression !\s*([\w-]+)\s*:(?!//)\s*(.*). That way we don't need to maintain a set of metadata keys supported by Adblock Plus in python-abp which would have been great.

However, if I now filter all lines in EasyList by that regular expression, I get following false positive:

! WARNING: Sites listed below can be harmful. Please, do not visit them if you don't know what you're doing!

While I couldn't find another example at the moment, I could also imagine filter lists using comments
like ! Note: ... or ! FIXME: ... (which are kinda common in source code).

What to change

Parse metadata in python-abp consistent with the new semantics in Adblock Plus as of #6923.

Change History (9)

comment:1 Changed 2 years ago by sebastian

We might want to avoid any fuzzy logic that will require as much effort to maintain as a whitelist would (e.g. a blacklist). Perhaps a whitelist wasn't the worst idea after all. But perhaps one generic approach that could work is to only consider a comment as metadata if it's at the top of the file. We can implement the same semantics in Adblock Plus in order to guarantee consistent results. However, we'd need to make sure to still strip checksums regardless of their position in the file for backwards-compatibility. What do you think?

Last edited 2 years ago by sebastian (previous) (diff)

comment:2 Changed 2 years ago by kvas

If existing filter lists have all metadata at the top, we could consider this part of the spec and adjust ABP and python-abp accordingly. Sounds like a simple way to take care of this and I can't think of any reasons to have metadata anywhere but at the top of the file, apart from the old checksum case, for which we will have special logic.

comment:3 Changed 2 years ago by sebastian

  • Owner set to sebastian

comment:4 Changed 2 years ago by sebastian

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

I had a go. Let me know what you think. If you agree to the general approach already, I go ahead and prepare the changes for Adblock Plus itself. Those will be trivial, and as a side effect make parsing metadata more efficient, since we can bail early and don't have to check every single line.

Last edited 2 years ago by sebastian (previous) (diff)

comment:5 Changed 2 years ago by sebastian

There we go: #6923

comment:6 Changed 2 years ago by sebastian

  • Description modified (diff)

comment:7 Changed 2 years ago by sebastian

  • Type changed from change to defect

comment:8 Changed 2 years ago by abpbot

A commit referencing this issue has landed:
Issue 6920 - Only parse metadata from the top of the file

comment:9 Changed 2 years ago by sebastian

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.