Opened on 09/03/2018 at 06:42:19 PM
Closed on 09/16/2018 at 12:42:48 PM
Last modified on 10/08/2019 at 06:05:35 PM
#6923 closed change (fixed)
Only parse metadata from special comments at the top of the file
Reported by: | sebastian | Assignee: | sebastian |
---|---|---|---|
Priority: | P2 | Milestone: | |
Module: | Core | Keywords: | |
Cc: | mjethani, kvas, Lain_13, dimisa, scuturic | Blocked By: | |
Blocking: | #6949 | Platform: | Unknown / Cross platform |
Ready: | yes | Confidential: | no |
Tester: | Ross | Verified working: | yes |
Review URL(s): |
https://codereview.adblockplus.org/29873570 |
Description (last modified by sebastian)
Background
In python-abp (the Python library used to parse and compile filter lists), we now have the problem that without hard-coding a redundant list of keys supported in special comments by Adblock Plus (which we'd like to avoid), there is no way to reliably tell if a comment is a special comment, since filter lists can contain comments like ! WARNING: or ! Note:. See #6920.
However, in practice filter lists put all special/metadata comments right after the filter list header at the top of the file. The only exception were ! Checksum: comments but those are no longer supported (see #6849).
Moreover, requiring special/metadata comments to be given at the top, not only makes it easier for tools like python-abp to reliably identify them, it also makes the parsing in Adblock Plus more efficient, since we can bail early and no longer need to check every single line in the filter list whether it might contain metadata.
What to change
Stop parsing special/metadata comments in lib/synchronizer.js as soon as we hit the first line after the header that isn't a comment with a colon.
Hints for testers
When a filter list is downloaded, special comments should be processed if (and only if) given after the header and before any line that isn't a comment with a colon.
Following filter list for example should show up under the name "foo" and with https://example.com as homepage in the settings:
[Adblock Plus 2.0] ! not a key: bla ! Title : foo ! Homepage: https://example.com ||example.com^
Following examples don't include any metadata to be processed (i.e. the the settings should show the URL instead of a name):
[Adblock Plus 2.0] ! ! Title: foo
[Adblock Plus 2.0] ||example.com^ ! Title: foo
You might also want to try different combinations and/or other special comments.
Moreover, please make sure that all special comments from existing filter lists (in particular those provided in the settings) are still considered.
Attachments (0)
Change History (29)
comment:1 Changed on 09/03/2018 at 06:43:24 PM by sebastian
- Review URL(s) modified (diff)
- Status changed from new to reviewing
comment:2 Changed on 09/04/2018 at 07:15:56 AM by mjethani
- Priority changed from Unknown to P2
comment:3 Changed on 09/04/2018 at 07:16:16 AM by mjethani
comment:4 Changed on 09/04/2018 at 10:25:56 AM by kvas
- Cc kvas added
comment:5 Changed on 09/04/2018 at 02:47:47 PM by sebastian
- Cc Lain_13 dimisa scuturic added
Lain, dimesa and Sasa, any objections, requiring metadata to be given at the top of the filter list (like specified in this issue)? Feel free to add other filter list authors that might concern this?
comment:6 Changed on 09/04/2018 at 03:03:00 PM by kvas
I we should define more precisely in the ticket what signals the end of the metadata for us (we have de facto definition from the code, but maybe we want to be a bit more relaxed there). The obvious options I see:
- Anything that doesn't look like a metadata comment (see the regexp in line 165),
- An empty line or a filter.
The currently proposed implementation is the first option but second option would allow having some comments interspersed with the metadata lines at the top. Does this sound useful to anyone?
comment:7 Changed on 09/04/2018 at 04:05:34 PM by sebastian
- Description modified (diff)
I made it explicit in the issue description. But no, I don't think that we should make it more relaxed. Looking at existing filter lists, I don't see any example of (non-special) comments in between metadata, but having comments follow the metadata (with no empty line in between) seem quite common.
comment:9 Changed on 09/10/2018 at 06:55:08 PM by abpbot
A commit referencing this issue has landed:
Issue 6923 - Only parse metadata from special comments at the top of the file
comment:10 Changed on 09/10/2018 at 06:55:55 PM by sebastian
- Resolution set to fixed
- Status changed from reviewing to closed
comment:11 Changed on 09/11/2018 at 05:05:05 PM by mjethani
@sebastian care to add some hints for testers here?
comment:12 Changed on 09/11/2018 at 07:47:08 PM by sebastian
- Description modified (diff)
comment:13 Changed on 09/11/2018 at 07:50:49 PM by sebastian
- Description modified (diff)
comment:14 Changed on 09/15/2018 at 04:12:53 PM by sebastian
comment:15 Changed on 09/15/2018 at 04:19:45 PM by sebastian
- Blocking 6949 added
comment:16 Changed on 09/16/2018 at 12:41:57 PM by abpbot
A commit referencing this issue has landed:
Issue 6923 - Simplify parsing of metadata
comment:17 Changed on 09/16/2018 at 12:42:48 PM by sebastian
- Resolution set to fixed
- Status changed from reopened to closed
comment:18 Changed on 09/16/2018 at 01:19:12 PM by mjethani
- Ready set
comment:19 Changed on 10/24/2018 at 01:16:05 PM by Ross
- Tester changed from Unknown to Ross
- Verified working set
Done. Working as described.
ABP 3.3.2.2175
Firefox 62 / 51 / Windows 10
Chrome 69 / 49 / Windows 10
Opera 56 / 36 / Windows 10
comment:20 Changed on 10/31/2018 at 01:59:11 AM by erikvold
This commit for this issue: https://gitlab.com/eyeo/adblockplus/adblockpluscore/commit/ee3158dca717452047320cb69a80a59513997c71
made a change to a regular expression here: https://gitlab.com/eyeo/adblockplus/adblockpluscore/commit/ee3158dca717452047320cb69a80a59513997c71#8525ff79982a26589c57c020069186faa286b695_163_163
by changing a \w+ bit for a .*? bit, for which no tests were added, also the ? in .*? is redundant.
The change adds support for special comments like ! : blah by simply ignoring it, this comment previously would've been treated as a filter rule and any special comments after it would've been treated as a filter rule too. I'm not saying it's a bad change, but it is a change for which no test was added, so if the change was intentional then we should add a test for this.
Also, I'm wondering why the change was made? because if this is a use case that we want to support then I'm not sure why we would want to support the use case. If it's not then we should also have a test that we do not support this use case.
I created a follow-up issue #7086
comment:21 follow-ups: ↓ 23 ↓ 25 Changed on 10/31/2018 at 02:19:18 AM by mjethani
The original line was:
let match = /^\s*!\s*(\w+)\s*:\s*(.*)/.exec(lines[i]);
Over two commits, it got changed to:
let match = /^\s*!\s*(.*?)\s*:\s*(.*)/.exec(lines[i]);
Now it looks like we could just have reverted it entirely. I agree about the tests.
I don't see how a line starting with ! could be treated as a filter rule.
Also the ? in .*? is not redundant, it makes it non-greedy. e.g. try matching ! hello : world, it'll give you hello (space) instead of hello, so you'd have to remove the space then.
comment:22 follow-up: ↓ 24 Changed on 10/31/2018 at 02:22:01 AM by erikvold
Also looks like support for !foo was removed here and I'm not sure why that was done.
comment:23 in reply to: ↑ 21 Changed on 10/31/2018 at 02:30:06 AM by erikvold
Replying to mjethani:
Also the ? in .*? is not redundant, it makes it non-greedy
Ah yes you are correct, my mistake!
comment:24 in reply to: ↑ 22 ; follow-ups: ↓ 26 ↓ 28 Changed on 10/31/2018 at 02:31:50 AM by sebastian
Replying to mjethani:
Now it looks like we could just have reverted it entirely.
The change from \w+ to .*? isn't an artifact of an incompletely reverted change, it is intentional, so that metadata after the bogus ! Last modified: line (in virtually every filter list) are still being parsed.
Replying to erikvold:
Also looks like support for !foo was removed here and I'm not sure why that was done.
In order to minimize the risk of false positives, which is more an issue for static analysis of filter lists (in python-abp which is agnostic of the keys supported by individual ad blockers) than it is for ABP itself.
comment:25 in reply to: ↑ 21 Changed on 10/31/2018 at 02:33:57 AM by erikvold
Replying to mjethani:
I don't see how a line starting with ! could be treated as a filter rule.
Ah it's a filter but it's ignored, at https://adblockplus.org/en/filters#comments it says:
"Any rule that starts with an exclamation mark is considered a comment. It will still show up in the filter list but in grey instead of black. Adblock Plus will ignore this rule for actual blocking so it is safe to write there whatever you want. You can place a comment rule above a real filter to describe what it is doing. Or you can put a comment on top of your filter list stating your authorship (most filter list authors do that)."
So going by that I think any line starting with ! should be allowed and ignored from the filters.
comment:26 in reply to: ↑ 24 Changed on 10/31/2018 at 02:55:18 AM by erikvold
Replying to sebastian:
Replying to erikvold:
Also looks like support for !foo was removed here and I'm not sure why that was done.
In order to minimize the risk of false positives, which is more an issue for static analysis of filter lists (in python-abp which is agnostic of the keys supported by individual ad blockers) than it is for ABP itself.
Hmm I'm still confused why we don't want to support !foo by ignoring it.
So now it looks like support for !foo was briefly added in https://gitlab.com/eyeo/adblockplus/adblockpluscore/commit/c9dc573b0c4a41b84f3dcad883876aa99d761078 for this issue and then it was removed in a following commit https://gitlab.com/eyeo/adblockplus/adblockpluscore/commit/ee3158dca717452047320cb69a80a59513997c71
Based on what I read at https://adblockplus.org/en/filters#comments which says "Any rule that starts with an exclamation mark is considered a comment. " we should be supporting !foo mixed in with the special comments.
comment:27 Changed on 10/31/2018 at 03:02:14 AM by sebastian
For example look at EasyList:
[Adblock Plus 2.0] ! Version: 201810310045 ! Title: EasyList ! Last modified: 31 Oct 2018 00:45 UTC ! Expires: 4 days (update frequency) ! Homepage: https://easylist.to/ ! Licence: https://easylist.to/pages/licence.html ! ! Please report any unblocked adverts or problems ! in the forums (https://forums.lanik.us/) ! or via e-mail (easylist.subscription@gmail.com). ! GitHub issues: https://github.com/easylist/easylist/issues ! GitHub pull requests: https://github.com/easylist/easylist/pulls ! ! -----------------------General advert blocking filters-----------------------! ! *** easylist:easylist/easylist_general_block.txt ***
If we'd just ignore comments and keep parsing metadata, then following lines would be recognized as metadata too:
! in the forums (https://forums.lanik.us/) ! GitHub issues: https://github.com/easylist/easylist/issues ! GitHub pull requests: https://github.com/easylist/easylist/pulls
Again, not a big issue for Adblock Plus (as it knows which key it supports), but for python-abp which has to understand the filter lists (e.g. in order to generate the diffs for incremental filter list updates).
comment:28 in reply to: ↑ 24 Changed on 10/31/2018 at 03:13:20 AM by mjethani
Replying to sebastian:
Replying to mjethani:
Now it looks like we could just have reverted it entirely.
The change from \w+ to .*? isn't an artifact of an incompletely reverted change, it is intentional, so that metadata after the bogus ! Last modified: line (in virtually every filter list) are still being parsed.
Of course! My bad, you're right, this was necessary.
comment:29 Changed on 08/07/2019 at 06:40:46 AM by redball12345671211
spam
@sebastian do you think it makes sense to ask the filter list authors before making this change?