Opened 19 months ago

Last modified 3 months ago

#4128 new change

[emscripten] Convert parsing patterns.ini to C++

Reported by: trev Assignee: sergz
Priority: P2 Milestone:
Module: Core Keywords:
Cc: Blocked By: #5137, #5153
Blocking: #4122 Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29548581/

Description (last modified by trev)

Background

See #4122 for the rationale. We should parse patterns.ini in C++ for better performance.

What to change

Create compiled/storage/parser.cpp providing a Parser class (should be exposed to JavaScript as _FilterStorage_Parser). The parser should allocate a 64kB (?) input buffer, with the pointer to the buffer being returned by its buffer property and its size by the bufferSize property. FilterStorage.importData (JavaScript, located in lib/filterStorage.js) should copy the incoming lines to the buffer, followed by a line break. Once there is no longer enough space in the buffer for the next line it should call Parser.process(numChars). Upon receiving a null string (finalization), FilterStorage.importData should make the parser process data still in the buffer and call Parser.finalize() then - this will replace existing subscriptions.

Notes

  • FilterStorage.importData() needs to expose _FilterStorage_Parser.delete() - it should delete the parser automatically upon finalization but the parser needs to be deleted explicitly if parsing is abandoned. Maybe the callback needs an abort() method.
  • The parser needs to keep a list with references to filters it processed. With the file format currently listing filters before subscriptions, the filters might get released otherwise and the changes of filter properties will be lost.
  • It is no longer feasible to prevent changes of filter/subscription properties before finalization. This should be ok as long as the list of subscriptions stays unchanged until finalization.
  • Methods FilterStorage.loadFromDisk() and FilterStorage.restoreBackup() can stay in JavaScript, without any changes.

Change History (9)

comment:1 Changed 8 months ago by trev

  • Blocked By 5137 added

comment:2 Changed 8 months ago by trev

  • Description modified (diff)
  • Summary changed from [emscripten] Convert parsing/serializing patterns.ini to C++ to [emscripten] Convert parsing patterns.ini to C++

comment:3 Changed 8 months ago by trev

  • Description modified (diff)

comment:4 Changed 8 months ago by trev

  • Description modified (diff)

comment:5 Changed 8 months ago by trev

  • Blocked By 5153 added

comment:6 Changed 3 months ago by sergz

Replying to trev:

Note also that this isn't blocking #5137, it's the other way round.

It is indeed blocking #5137. FilterStorage is using the parser, so it's impossible to implement FilterStorage, thus accomplish #5137, without the parser. On the other hand, one can implement the parser without FilterStorage, use it anywhere, e.g. in tests, or not use, and close the current issue.
So, I think that what is Blocked By here should actually be Blocking.

comment:7 Changed 3 months ago by sergz

If we still use INI-like file format and already have lines, then why should one fiddle with the input buffer of the parser? I would say that the INI-like parser should have void Process(const String& line) and void Finalize() methods.

FilterStorage.importData() needs to expose _FilterStorage_Parser.delete() - it should delete the parser automatically upon finalization but the parser needs to be deleted explicitly if parsing is abandoned. Maybe the callback needs an abort() method.

Could you please explain how the parsing can be abandoned? It's natural to expect that while an instance of FilterStorage exists it's responsible to not allow several parsing processes. If the parsing process is asynchronous then both on a new parsing request FilterStorage and on destroying of FilterStorage, the latter should stop the currently running parsing process.

The parser needs to keep a list with references to filters it processed. With the file format currently listing filters before subscriptions, the filters might get released otherwise and the changes of filter properties will be lost.
It is no longer feasible to prevent changes of filter/subscription properties before finalization. This should be ok as long as the list of subscriptions stays unchanged until finalization.

Could you please give more details for these both sections?

comment:8 Changed 3 months ago by sergz

  • Owner set to sergz

comment:9 Changed 3 months ago by sergz

  • Review URL(s) modified (diff)
Note: See TracTickets for help on using tickets.