Opened on 03/08/2016 at 11:21:06 AM

Closed on 04/22/2016 at 08:50:28 AM

#3755 closed change (fixed)

Python filter list parsing API

Reported by: kvas Assignee: kvas
Priority: P3 Milestone:
Module: Sitescripts Keywords:
Cc: greiner, sebastian, kzar Blocked By:
Blocking: #3849 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29338646/

Description (last modified by kzar)

Background

In the context of separating common functionality from sitescripts into a library, the first step is parsing of the filter lists. While we aim to possibly support many different use cases in the future, our current focus is providing support for rendering of filter lists (see #3849).

What to change

Create a module in python-abp library for parsing of filter lists.

The parser should convert a filter list (coming from a file or an HTTP response) to an iterator of python objects that represent different types of lines found in a filter list. Specifically we need to distinguish the following types of lines (their properties are given in square brackets):

  • empty line (line that only has whitespace or nothing in it),
  • header [program version],
  • regular comments [comment text],
  • metadata comments [key, value],
  • instructions (e.g. %include ...%) [name of the instruction, arguments].
  • filters (not parsed further at the moment, but it should be easy to add this functionality in the future if it's needed) [filter expression].

See this spec for a more specific definition of different line types in Augmented BNF notation (ABNF). The spec covers filter lists as understood by Adblock Plus plugin, so instructions that are intended for preprocessing are not covered by it. The ABNF for these preprocessing instructions would look like this:

instruction = include
include = "%include" *1WSP include-path "%"
include-path = url / local-path
local-path = [ source-name ":" ] path-in-source

where url is defined in rfc3986, source-name is an alphanumeric string and path-in-source is a filesystem path separated by forward slashes.

Each line object produced by the parser should contain all information necessary to produce the original string version of the line (variations in non-significant whitespace are ok). The string version should be produced by line_object.to_string().

Take care to write unit tests.

Attachments (0)

Change History (19)

comment:1 Changed on 03/08/2016 at 11:31:09 AM by kvas

  • Blocking 3751 added

comment:2 Changed on 03/08/2016 at 12:41:50 PM by greiner

  • Cc greiner added

comment:3 Changed on 03/14/2016 at 04:22:24 PM by kvas

  • Component changed from Unknown to Sitescripts

comment:4 Changed on 03/17/2016 at 12:37:02 PM by kvas

  • Description modified (diff)
  • Sensitive unset
  • Summary changed from Filterlist parser low level API to Filterlist parser "streaming" API

comment:5 Changed on 03/18/2016 at 03:49:00 PM by kvas

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:6 Changed on 03/19/2016 at 10:41:02 PM by kzar

  • Cc sebastian kzar added
  • Priority changed from P3 to Unknown
  • Ready unset

When creating issues please CC in the module owner and don't mark the issue as ready. It's the module owner's decision if the issue is ready and to assign a priority.

In this case I don't think the issue is ready, the description is too vague. It doesn't give me enough background to understand what the script is supposed to do, or for what purpose, or for whom.

  • Add a background section, explain the rough idea of this script and what its purpose is. If it's replacing an existing script give some background about that, who uses the script and to what end. Why are we even changing things in the first place?
  • Improve the "What to change" section, "The streaming API should convert each line of a filter list to a parsed representation distinguishing different types of filters and supporting basic features of each different type." doesn't mean much to me. (Specifically what is the "parsed representation" and how does it differ from what we're starting with?) Enumerate how the script should transform a filter list and give concrete examples.

Without these kind of details it's going to be really hard or even impossible for me to review your code. I'm not familiar with the combineSubscriptions script at all. It's also important so that other people can understand what's going on, for example the testers.

comment:7 Changed on 03/21/2016 at 09:16:01 AM by kvas

  • Description modified (diff)

comment:8 Changed on 03/21/2016 at 06:09:36 PM by kzar

  • Blocking 3849 added

comment:9 Changed on 03/21/2016 at 06:10:18 PM by kzar

  • Blocking 3751 removed

comment:10 Changed on 03/22/2016 at 04:53:38 PM by kvas

  • Blocking 3751 added
  • Description modified (diff)
  • Summary changed from Filterlist parser "streaming" API to Filter list parser

comment:11 Changed on 03/24/2016 at 01:19:32 PM by kvas

  • Description modified (diff)

comment:12 follow-up: Changed on 03/29/2016 at 03:58:58 PM by kzar

Their BFN would be: ...

What's BFN? (I also have no idea what the code block below is showing.)

comment:13 Changed on 03/29/2016 at 04:11:06 PM by kvas

  • Description modified (diff)

comment:14 in reply to: ↑ 12 Changed on 03/29/2016 at 04:12:08 PM by kvas

Replying to kzar:

Their BFN would be: ...

What's BFN? (I also have no idea what the code block below is showing.)

Sorry, that was a typo. Should have been BNF. And the actual notation used here is ABNF, which I linked from the text now.

comment:15 Changed on 03/29/2016 at 04:57:59 PM by kvas

  • Description modified (diff)

comment:16 Changed on 03/29/2016 at 08:58:15 PM by kzar

  • Blocking 3751 removed

As this issue is blocking #3849 which in turn blocks #3751 we don't need to mark this as blocking #3751.

(See the "Depgraph" button at the top right of the page. If you click on that for #3751 you can see the whole graph of dependencies.)

comment:17 Changed on 03/29/2016 at 09:05:25 PM by kzar

  • Description modified (diff)
  • Priority changed from Unknown to P3
  • Ready set
  • Summary changed from Filter list parser to Python filter list parsing API

I added a quick note to write unit tests and tweaked the title, but otherwise this looks OK now. Marking as ready, thanks.

comment:18 Changed on 04/22/2016 at 08:48:02 AM by abpbot

A commit referencing this issue has landed:
https://hg.adblockplus.org/python-abp/rev/5bc6a95c0207

comment:19 Changed on 04/22/2016 at 08:50:28 AM by kvas

  • Resolution set to fixed
  • Status changed from reviewing to closed

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from kvas.
 
Note: See TracTickets for help on using tickets.