Opened on 09/02/2018 at 02:32:33 PM
Closed on 09/10/2018 at 06:53:30 PM
Last modified on 09/11/2019 at 10:14:46 AM
#6920 closed defect (fixed)
[python-abp] Some comments are inaccurately recognized as metadata
Reported by: | sebastian | Assignee: | sebastian |
---|---|---|---|
Priority: | Unknown | Milestone: | |
Module: | Sitescripts | Keywords: | |
Cc: | kvas, rhowell | Blocked By: | |
Blocking: | Platform: | Unknown / Cross platform | |
Ready: | no | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by sebastian)
Background
With #6685, we removed the whitelist and started to recognize any comment as metadata that matches the regular expression !\s*([\w-]+)\s*:(?!//)\s*(.*). That way we don't need to maintain a set of metadata keys supported by Adblock Plus in python-abp which would have been great.
However, if I now filter all lines in EasyList by that regular expression, I get following false positive:
! WARNING: Sites listed below can be harmful. Please, do not visit them if you don't know what you're doing!
While I couldn't find another example at the moment, I could also imagine filter lists using comments
like ! Note: ... or ! FIXME: ... (which are kinda common in source code).
What to change
Parse metadata in python-abp consistent with the new semantics in Adblock Plus as of #6923.
Attachments (0)
Change History (9)
comment:1 Changed on 09/02/2018 at 02:34:47 PM by sebastian
comment:2 Changed on 09/03/2018 at 12:37:33 PM by kvas
If existing filter lists have all metadata at the top, we could consider this part of the spec and adjust ABP and python-abp accordingly. Sounds like a simple way to take care of this and I can't think of any reasons to have metadata anywhere but at the top of the file, apart from the old checksum case, for which we will have special logic.
comment:3 Changed on 09/03/2018 at 03:41:33 PM by sebastian
- Owner set to sebastian
comment:4 Changed on 09/03/2018 at 05:58:18 PM by sebastian
- Review URL(s) modified (diff)
- Status changed from new to reviewing
I had a go. Let me know what you think. If you agree to the general approach already, I go ahead and prepare the changes for Adblock Plus itself. Those will be trivial, and as a side effect make parsing metadata more efficient, since we can bail early and don't have to check every single line.
comment:7 Changed on 09/05/2018 at 01:12:38 PM by sebastian
- Type changed from change to defect
comment:8 Changed on 09/10/2018 at 06:52:30 PM by abpbot
A commit referencing this issue has landed:
Issue 6920 - Only parse metadata from the top of the file
comment:9 Changed on 09/10/2018 at 06:53:30 PM by sebastian
- Resolution set to fixed
- Status changed from reviewing to closed
We might want to avoid any fuzzy logic that will require as much effort to maintain as a whitelist would (e.g. a blacklist). Perhaps a whitelist wasn't the worst idea after all. But perhaps one generic approach that could work is to only consider a comment as metadata if it's at the top of the file. We can implement the same semantics in Adblock Plus in order to guarantee consistent results. However, we'd need to make sure to still strip checksums regardless of their position in the file for backwards-compatibility. What do you think?