Opened 4 months ago

Last modified 6 weeks ago

#6833 new change

Add support for incremental filter list updates

Reported by: sebastian Assignee: erikvold
Priority: P2 Milestone:
Module: Core Keywords:
Cc: erikvold, kzar, rhowell, greiner, mjethani, sergz, sporz Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://gitlab.com/eyeo/adblockplus/adblockpluscore/merge_requests/2

Description (last modified by sebastian)

Background

Currently the whole filter list is downloaded when the filter list is updated (e.g. once a day in case of EasyList). We'd like to decrease bandwidth usage while updating filter lists at a higher frequency.

What to change

Implement the client-side support for incremental filter list updates as specified in ​this specification.

Change History (24)

comment:1 Changed 4 months ago by greiner

  • Cc greiner added

comment:3 Changed 4 months ago by erikvold

  • Owner set to erikvold

comment:4 Changed 4 months ago by kzar

  • Cc mjethani sergz added
  • Description modified (diff)
  • Priority changed from P3 to P2

I think it's better to put some of the details about what we're implementing for this issue in the description. I've had a go, but Sebastian/Rosie feel free to tweak the description if you think I've got something wrong or missed something.

Once the FIXME comments in the description are addressed, I consider this issue ready. So if I'm not around to triage, feel free to mark this as Ready once you've addressed those.

It might make sense to break this down into multiple issues as well, I've not looked into how this will be implemented in practice so I'm not sure. Erik if some of these changes need to happen in adblockpluschrome for example, or if the changes make sense to be split up please file further issues which are blocked by / block this one.

comment:5 Changed 4 months ago by kzar

  • Description modified (diff)

comment:6 Changed 4 months ago by kzar

  • Description modified (diff)

comment:7 Changed 4 months ago by sebastian

Thanks, Dave, but what we have in the issue description now, is a widely incomplete (and partly inaccurate) description of what should be implemented. In order to fix this, I would essentially have to copy the whole protocol section from the spec in here. That's why I decided to rather just refer to the spec, avoiding duplication and potential inconsistencies.

comment:8 Changed 4 months ago by mjethani

I think it's OK to link to the document from the issue for the purpose of starting work on the issue. Eventually we can either copy the entire final specification into the description or link to the specification hosted at a place that is openly accessible (Google Docs requires a Google account?). I agree that for now it's not worth duplicating information.

In the long run, for features that have large specification documents, we should have a standard place to host the specifications. I don't think Trac is ideal for writing a detailed specification.

I know that the UI team has been using GitLab for this.

comment:9 Changed 4 months ago by sebastian

  • Description modified (diff)
  • Ready set

Actually this spec document is public, i.e. you don't need to sign into Google when you click the link in order to view it (but in order to comment or edit). In the long-run we might want to convert it to markdown and host in some VCS though, for better version control, like the UI team is doing.

Anyway, since Dave is on vacation, and Manish (who is the module owner for Core) seems to agree, I reverted the issue description and set the issue to ready.

Also, FWIW, as far as I'm concerned, this issue is rather a P3 than a P2, as it's something we agreed to work on, but hasn't been given priority over other development efforts.

comment:10 Changed 4 months ago by sporz

  • Cc sporz added

comment:11 follow-ups: Changed 3 months ago by ameshkov

Hello guys,

Incremental updates will be useful to everyone so let me please chime in :)

I've read the spec, and I don't quite understand how are the client applications protected from accidentally downloading a wrong diff. For instance, there might be two different changes during the Diff-Expires interval, and as I understand, the diff file will contain changes for the second change only.

I suggest using something more foolproof. Here is an idea on how it can be improved.

  1. Introduce one more special comment - ! Diff-Head: HERE_GOES_THE_HEAD_COMMIT_OR_VERSION
  2. Use the regular Diff file syntax and indicate the before/after commits (just like the git-diff does)
  3. Let the diff file contain more than just the last change. For instance, it may include diffs of the previous 10-20 changes (concatenated using a special separator).
  4. In this case, the client will be able to locate the exact changes that occurred since the last update check.

It will be easy to build such diff file if filter maintainers use git or svn for version control.
Alternatively, they may use the filter version for the Diff-Head value.

comment:12 in reply to: ↑ 11 ; follow-up: Changed 3 months ago by rhowell

Replying to ameshkov:

I've read the spec, and I don't quite understand how are the client applications protected from accidentally downloading a wrong diff. For instance, there might be two different changes during the Diff-Expires interval, and as I understand, the diff file will contain changes for the second change only.

I don't believe we will run into this case. While the server-side implementation hasn't been specified yet, there have been some discussions about it (e.g. in the workshop). One option is to keep the original/base filter lists on the server, probably in the same folder as the diff. Then, each time a new filter list is generated, the diffs will be updated to show the difference between the base version and the latest version. Maybe a file structure like:

  • current_filterlist.txt
  • version1/
    • v1_filterlist.txt
    • v1_to_current.diff
  • version2/
    • v2_filterlist.txt
    • v2_to_current.diff

The previous lists will be static, but the diffs will change each time the current filter list changes. So, no matter what previous version the client currently has, the result of applying the diff will be the latest version.

comment:13 in reply to: ↑ 11 Changed 3 months ago by sebastian

Replying to ameshkov:

  1. Use the regular Diff file syntax and indicate the before/after commits (just like the git-diff does)

That wouldn't be possible with the current implementation of Adblock Plus since we don't save the original/unchanged filter text on the client-side but parse, normalize and store the filters in a slightly different format. Moreover, (without third-party libraries) the diff format we chose is simpler to implement (in particular on the client-side). Also it is more compact.

  1. Let the diff file contain more than just the last change. For instance, it may include diffs of the previous 10-20 changes (concatenated using a special separator).

The diff will include all changes between the last update and the latest version available.

comment:14 in reply to: ↑ 12 Changed 3 months ago by ameshkov

I guess my confusion was caused by misreading some parts of the "diff discovery" section. For some reason, I was thinking that the diff file is a single static file which location is specified by ! Diff-URL. Clearly, that's not the case, thanks for the explanation.

comment:15 Changed 8 weeks ago by erikvold

  • Review URL(s) modified (diff)

comment:17 Changed 6 weeks ago by erikvold

  • Blocked By 7085 added

comment:18 Changed 6 weeks ago by erikvold

  • Blocked By 7084 added

comment:19 Changed 6 weeks ago by erikvold

  • Blocked By 7044 added

comment:20 Changed 6 weeks ago by erikvold

  • Blocked By 5449 added

comment:21 Changed 6 weeks ago by erikvold

  • Blocked By 7086 added

comment:22 Changed 6 weeks ago by kzar

  • Blocked By 5449 removed

comment:23 follow-up: Changed 6 weeks ago by sebastian

  • Blocked By 7044, 7084, 7085, 7086 removed

Thanks for filing those issues, however, none of them are blocking this change.

#7044: I left a comment on that issue for clarification, but in case you have to override/extend eslint-config-eyeo for some parts of the code, you can do that through an .eslintrc.json file in the respective folder or through an inline comment in the respective file, along your changes that require to relax the rules. No need to handle that upfront as a separate change.

#7084, #7085: Thanks for filing those issues, but no need to hold back the implementation of incremental filter list updates for some linter improvements.

#7086: Yes, you accidentally introduced a regression in your initial patch, which more complete tests could have avoided. But that change was unrelated and therefore shouldn't have been done in the first place. Anyway, while I think adding more tests for the semantics of metadata in filter lists is a good idea, I don't think that this necessarily has to be done before we can move forward with this issue.

comment:24 in reply to: ↑ 23 Changed 6 weeks ago by erikvold

Replying to sebastian:

Thanks for filing those issues, however, none of them are blocking this change.

#7044: I left a comment on that issue for clarification, but in case you have to override/extend eslint-config-eyeo for some parts of the code, you can do that through an .eslintrc.json file in the respective folder or through an inline comment in the respective file, along your changes that require to relax the rules. No need to handle that upfront as a separate change.

I understand we can alter the rules for a project, we did that for the flattr-extension project, I don't think we should handle that change in this issue though. We can figure where the change makes most sense in #7044, and then make a separate commit to resolve it.

#7084, #7085: Thanks for filing those issues, but no need to hold back the implementation of incremental filter list updates for some linter improvements.

It's just much faster in the long run to deal with these issues as they come up instead of avoiding it.

#7086: Yes, you accidentally introduced a regression in your initial patch, which more complete tests could have avoided. But that change was unrelated and therefore shouldn't have been done in the first place. Anyway, while I think adding more tests for the semantics of metadata in filter lists is a good idea, I don't think that this necessarily has to be done before we can move forward with this issue.

I just think it's wise to make sure we have these tests in place before making the changes for this issue. This regression and others could have been caused and not caught, and I don't like landing a patch on that type of situation, this is why we have tests.

Last edited 6 weeks ago by erikvold (previous) (diff)
Note: See TracTickets for help on using tickets.