Opened 14 months ago

Closed 13 months ago

Last modified 6 months ago

#6976 closed defect (fixed)

[python-abp] Filters that contain square brackets in the first line of a filter list fragment break rendering

Reported by: kvas Assignee: kvas
Priority: P2 Milestone:
Module: Sitescripts Keywords:
Cc: sebastian, atudor, arthur Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29901579

Description

Environment

Easylist before this commit.

How to reproduce

Try to render easylist using flrender from python-abp.

Observed behaviour

$ flrender -i easylist=. easylist.template output/easylist.txt
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7.14/bin/flrender", line 11, in <module>
    load_entry_point('python-abp==0.0.1', 'console_scripts', 'flrender')()
  File "/home/travis/virtualenv/python2.7.14/lib/python2.7/site-packages/abp/filters/render_script.py", line 67, in main
    for line in lines:
  File "/home/travis/virtualenv/python2.7.14/lib/python2.7/site-packages/abp/filters/renderer.py", line 132, in _remove_checksum
    for line in lines:
  File "/home/travis/virtualenv/python2.7.14/lib/python2.7/site-packages/abp/filters/renderer.py", line 102, in _process_timestamps
    for line in lines:
  File "/home/travis/virtualenv/python2.7.14/lib/python2.7/site-packages/abp/filters/renderer.py", line 91, in _process_includes
    for line in all_included:
  File "/home/travis/virtualenv/python2.7.14/lib/python2.7/site-packages/abp/filters/renderer.py", line 76, in _process_includes
    for line in lines:
  File "/home/travis/virtualenv/python2.7.14/lib/python2.7/site-packages/abp/filters/parser.py", line 340, in parse_filterlist
    parsed_line = parse_line(line, position)
  File "/home/travis/virtualenv/python2.7.14/lib/python2.7/site-packages/abp/filters/parser.py", line 299, in parse_line
    raise ParseError('Malformed header', line)
abp.filters.parser.ParseError: Malformed header in "@@/cdn-cgi/pe/bag2?r[]=*ads.exoclick.com$xmlhttprequest,domain=hd-porn.me"
The command "flrender -i easylist=. easylist.template output/easylist.txt" exited with 1.

Expected behaviour

There should be no exception and the filter list should be generated.

Explanation and notes on the fix

With the fix for #6877 I have tried to be extra helpful to the filter authors and detect broken headers early (when parsing the fragment that has them). This seems to have backfired: if a filter list fragment starts with a filter that has square brackets, python-abp thinks that the line tries to be a header and throws an invalid header exception.

I think we should chill out a bit about being so helpful and just use the old regex with search. This way if the first line contains a valid header, it will be parsed as such, otherwise it will be considered a filter. A slightly wrong header would be parsed as a filter (and all the metadata following it would become comments) but I would argue that this doesn't create any serious problems.

There are two possibilities in the slightly broken header case:

  1. Slightly broken header is at the beginning of the root fragment, so it would be the first line in the output. In this case we would really like to detect that the header is broken to avoid producing a filter list that ABP will reject. The parser won't help us after the proposed changes but the renderer will catch it and still refuse to produce a broken filter list.
  2. Slightly broken header is at the beginning of a non-root fragment. In this case the output will include the slightly broken header because it will be considered a filter. The output will also include any following metadata because it will be considered normal comments and not stripped. We will end up with an extra filter and some metadata comments in the middle of the filter list. The metadata comments will be ignored by the extension because they come after a filter. As for the filter, it seems that it's very unlikely that an accidental filter that was intended to be a header will match anything. We will end up with some comments and one useless filter in the list which doesn't sound too worrying.

In either case there's no immediate danger. In addition to this, in the second case, any fragment that has a header at the beginning is likely to also be used as a standalone filter list. Otherwise why would it have a header? If the standalone list is rendered via python-abp, the rendering will break and if it's not rendered, ABP will ignore the list when loading it. Either way the broken header is likely to be discovered soon enough and fixed.

Change History (6)

comment:1 Changed 14 months ago by sebastian

I don't have any better idea.

comment:2 Changed 13 months ago by kvas

  • Component changed from Unknown to Sitescripts

comment:3 Changed 13 months ago by sebastian

  • Owner set to kvas

comment:4 Changed 13 months ago by sebastian

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:5 Changed 13 months ago by abpbot

A commit referencing this issue has landed:
Issue 6976 - Ignore invalid headers instead of crashing

comment:6 Changed 13 months ago by kvas

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.