Opened on 03/08/2016 at 11:21:06 AM
Closed on 04/22/2016 at 08:50:28 AM
#3755 closed change (fixed)
Python filter list parsing API
Reported by: | kvas | Assignee: | kvas |
---|---|---|---|
Priority: | P3 | Milestone: | |
Module: | Sitescripts | Keywords: | |
Cc: | greiner, sebastian, kzar | Blocked By: | |
Blocking: | #3849 | Platform: | Unknown / Cross platform |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by kzar)
Background
In the context of separating common functionality from sitescripts into a library, the first step is parsing of the filter lists. While we aim to possibly support many different use cases in the future, our current focus is providing support for rendering of filter lists (see #3849).
What to change
Create a module in python-abp library for parsing of filter lists.
The parser should convert a filter list (coming from a file or an HTTP response) to an iterator of python objects that represent different types of lines found in a filter list. Specifically we need to distinguish the following types of lines (their properties are given in square brackets):
- empty line (line that only has whitespace or nothing in it),
- header [program version],
- regular comments [comment text],
- metadata comments [key, value],
- instructions (e.g. %include ...%) [name of the instruction, arguments].
- filters (not parsed further at the moment, but it should be easy to add this functionality in the future if it's needed) [filter expression].
See this spec for a more specific definition of different line types in Augmented BNF notation (ABNF). The spec covers filter lists as understood by Adblock Plus plugin, so instructions that are intended for preprocessing are not covered by it. The ABNF for these preprocessing instructions would look like this:
instruction = include include = "%include" *1WSP include-path "%" include-path = url / local-path local-path = [ source-name ":" ] path-in-source
where url is defined in rfc3986, source-name is an alphanumeric string and path-in-source is a filesystem path separated by forward slashes.
Each line object produced by the parser should contain all information necessary to produce the original string version of the line (variations in non-significant whitespace are ok). The string version should be produced by line_object.to_string().
Take care to write unit tests.
Attachments (0)
Change History (19)
comment:1 Changed on 03/08/2016 at 11:31:09 AM by kvas
- Blocking 3751 added
comment:2 Changed on 03/08/2016 at 12:41:50 PM by greiner
- Cc greiner added
comment:3 Changed on 03/14/2016 at 04:22:24 PM by kvas
- Component changed from Unknown to Sitescripts
comment:4 Changed on 03/17/2016 at 12:37:02 PM by kvas
- Description modified (diff)
- Sensitive unset
- Summary changed from Filterlist parser low level API to Filterlist parser "streaming" API
comment:5 Changed on 03/18/2016 at 03:49:00 PM by kvas
- Review URL(s) modified (diff)
- Status changed from new to reviewing
comment:6 Changed on 03/19/2016 at 10:41:02 PM by kzar
- Cc sebastian kzar added
- Priority changed from P3 to Unknown
- Ready unset
comment:8 Changed on 03/21/2016 at 06:09:36 PM by kzar
- Blocking 3849 added
comment:9 Changed on 03/21/2016 at 06:10:18 PM by kzar
- Blocking 3751 removed
comment:10 Changed on 03/22/2016 at 04:53:38 PM by kvas
- Blocking 3751 added
- Description modified (diff)
- Summary changed from Filterlist parser "streaming" API to Filter list parser
comment:11 Changed on 03/24/2016 at 01:19:32 PM by kvas
- Description modified (diff)
comment:12 follow-up: ↓ 14 Changed on 03/29/2016 at 03:58:58 PM by kzar
Their BFN would be: ...
What's BFN? (I also have no idea what the code block below is showing.)
comment:13 Changed on 03/29/2016 at 04:11:06 PM by kvas
- Description modified (diff)
comment:14 in reply to: ↑ 12 Changed on 03/29/2016 at 04:12:08 PM by kvas
Replying to kzar:
Their BFN would be: ...
What's BFN? (I also have no idea what the code block below is showing.)
Sorry, that was a typo. Should have been BNF. And the actual notation used here is ABNF, which I linked from the text now.
comment:15 Changed on 03/29/2016 at 04:57:59 PM by kvas
- Description modified (diff)
comment:16 Changed on 03/29/2016 at 08:58:15 PM by kzar
- Blocking 3751 removed
comment:17 Changed on 03/29/2016 at 09:05:25 PM by kzar
- Description modified (diff)
- Priority changed from Unknown to P3
- Ready set
- Summary changed from Filter list parser to Python filter list parsing API
I added a quick note to write unit tests and tweaked the title, but otherwise this looks OK now. Marking as ready, thanks.
comment:18 Changed on 04/22/2016 at 08:48:02 AM by abpbot
A commit referencing this issue has landed:
https://hg.adblockplus.org/python-abp/rev/5bc6a95c0207
comment:19 Changed on 04/22/2016 at 08:50:28 AM by kvas
- Resolution set to fixed
- Status changed from reviewing to closed
When creating issues please CC in the module owner and don't mark the issue as ready. It's the module owner's decision if the issue is ready and to assign a priority.
In this case I don't think the issue is ready, the description is too vague. It doesn't give me enough background to understand what the script is supposed to do, or for what purpose, or for whom.
Without these kind of details it's going to be really hard or even impossible for me to review your code. I'm not familiar with the combineSubscriptions script at all. It's also important so that other people can understand what's going on, for example the testers.