Opened 2 months ago

Last modified 8 weeks ago

#7438 new change

Use streaming fetch for filter list downloads

Reported by: mjethani Assignee:
Priority: Unknown Milestone:
Module: Core Keywords:
Cc: sebastian, erikvold, kzar, sergz Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by mjethani)

Background

In #7381 we replaced our use of XMLHttpRequest with the Fetch API. Unlike XMLHttpRequest, the Fetch API supports streaming the response body. Now instead of loading the entire filter list in memory, filters could be read, parsed, and activated (but see comment:1) while they are still being downloaded. This may not have any significant benefit on desktop platforms but it could mean something for mobile.

On the other hand, we are moving to incremental filter list updates anyway (#6833), therefore it may not add much value in any case.

Compatibility

Streaming fetch is available only on Chrome 52+ and Firefox 65+ (see comment:8)

Resolution

[To be determined.]

Change History (10)

comment:1 Changed 2 months ago by sebastian

Interesting idea, but there are caveats:

  1. The download can still fail (e.g. the connection can drop out) at any time before the response body has been completely consumed. In this case, we would have to roll back, discarding the filters that already have been parsed.
  2. A filter list update is meant to be atomic. So no filter given in any version of a filter list is meant to be applied without all the other filters given in that version of the filter list. If a filter list is only partially applied this would result into potentially inconsistent and untested behavior.

So it seems while we could stream the response and parse filters lazily as they are received over the network, we should not store and activate the parsed filters until the response is complete.

But what do we gain then? By starting to parse the filters in parallel while the response isn't complete yet, we might save some milliseconds between the response completing and the filters becoming active, but overall CPU usage and peek memory usage will increase. Instead of a string with the response body, we'd keep a parsed copy (using more memory) of the filter list's new version in memory, but either way we cannot discard the filters from the previous version until the response is complete.

comment:2 follow-up: Changed 2 months ago by mjethani

I have to agree with your points and especially the second one is very important and something I did not think about.

Assuming that a download would rarely ever get terminated midway, the only benefit of the streaming approach might be to avoid the spike in memory usage when the filter list data in its raw form is loaded into memory.

Once we have incremental filter list updates (#6833) we could revisit this idea, but I agree it does not sound very promising.

comment:3 Changed 2 months ago by mjethani

  • Description modified (diff)

comment:4 in reply to: ↑ 2 ; follow-up: Changed 2 months ago by sebastian

Replying to mjethani:

Assuming that a download would rarely ever get terminated midway

Of course this isn't the common case, but it's not uncommon enough to not be handled either, in particular with flaky mobile data connections and scenarios like China in mind.

the only benefit of the streaming approach might be to avoid the spike in memory usage when the filter list data in its raw form is loaded into memory.

I'm not exactly sure what you are suggesting here. The peek memory usage would be higher (not lower) when filters are parsed instead of being hold in a single string (while the download is in progress), unless you ignore the caveats above and store/activate the filters before the response completes.

Once we have incremental filter list updates (#6833) we could revisit this idea, but I agree it does not sound very promising.

If this approach doesn't make sense now, how would it make sense when we are downloading much less data at average, due to incremental updates?

comment:5 in reply to: ↑ 4 Changed 2 months ago by mjethani

Replying to sebastian:

Replying to mjethani:

Assuming that a download would rarely ever get terminated midway

Of course this isn't the common case, but it's not uncommon enough to not be handled either, in particular with flaky mobile data connections and scenarios like China in mind.

I did not mean we should not handle the case, I meant that we would have to discard the parsed filters if the connection drops, but it would be relatively rare for this to happen so it would be overall more beneficial to start parsing filters since they would have to be parsed anyway for the more common case of the connection not dropping.

the only benefit of the streaming approach might be to avoid the spike in memory usage when the filter list data in its raw form is loaded into memory.

I'm not exactly sure what you are suggesting here. The peek memory usage would be higher (not lower) when filters are parsed instead of being hold in a single string (while the download is in progress), unless you ignore the caveats above and store/activate the filters before the response completes.

Are you sure about this? If not, the only way to find out is to test it out.

Also you're ignoring the fact that we intend to discard filter objects once parsed (#7097). We are already a step closer to this now since I even opened this ticket.

Once we have incremental filter list updates (#6833) we could revisit this idea, but I agree it does not sound very promising.

If this approach doesn't make sense now, how would it make sense when we are downloading much less data at average, due to incremental updates?

Well I don't see how it doesn't make sense now since what you're arguing above seems to be based on speculation rather than actual knowledge about how the system behaves before and after such a change.

What I see is that there is a memory spike when the filter lists are synced, and I don't know if switching to streaming fetch might help but it's worth exploring.

comment:6 Changed 2 months ago by sebastian

A filter object stores in addition to some parsed information, the raw filter text. So how can this data structure use any less memory than the raw filter text alone?

Then again, I guess, we could just hold on to the filters that are new or to be removed. But on the other hand, with incremental filter lists updates we go even a step further, just sending those over the network.

Last edited 2 months ago by sebastian (previous) (diff)

comment:7 follow-up: Changed 2 months ago by mjethani

Let's say you load 1,000 raw filters into memory and each one takes up 100 bytes. Now you need 100,000 bytes of memory. After loading these filters, you parse them into objects, each of which takes let's say 400 bytes of memory. Now you need a total of 500,000 bytes at the same time.

If on the other hand you read each filter and immediately convert it to a filter object, then you need only 400,100 bytes of memory at most.

This is assuming both the load-all-at-once and load-one-at-a-time approaches do not allocate memory in huge chunks of several multiples of the memory expected to be required. I don't know how the streaming and non-streaming implementations of fetch() behave, how they allocate memory, and so on. The only way to find out is to try it out.

comment:9 Changed 2 months ago by sebastian

Well spotted! Regardless whether there is anything to gain here, I'd rather not drop support for Chrome 49-51 and Firefox 51-64 yet.

comment:10 Changed 8 weeks ago by mjethani

  • Description modified (diff)
Note: See TracTickets for help on using tickets.