Opened on 03/07/2016 at 07:13:00 PM
Closed on 03/09/2016 at 04:26:29 PM
#3750 closed change (fixed)
Cache result of web.adblockplus.org's get_subscriptions filter
Reported by: | matze | Assignee: | kzar |
---|---|---|---|
Priority: | P1 | Milestone: | |
Module: | Websites | Keywords: | |
Cc: | saroyanm, juliandoucette, fhd, fred, kvas, kzar, trev | Blocked By: | |
Blocking: | Platform: | Unknown / Cross platform | |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by kzar)
Generation of web.adblockplus.org is extremely slow when the subscriptions list isn't skipped. To generate the subscription list the get_subscriptions filter is called and that in turn fetches and processes some files to generate the list.
It turns out that the get_subscriptions filter takes some time to run and is also called once per locale. To improve performance we should cache its result the first time, as to avoid repeatedly fetching and processing the same files.
Note: The performance problem here is so severe that Matze has had to suspend the cron-job to automatically regenerate the website, hence P1.
Attachments (0)
Change History (9)
comment:1 Changed on 03/08/2016 at 10:43:46 AM by saroyanm
- Cc kzar trev added
comment:2 Changed on 03/08/2016 at 11:01:35 AM by matze
With that function accessing two resources at https://hg.adblockplus.org/ I thought the bottleneck may be easy to find, but neither of those downloads takes long enough to explain the issue:
$ time wget https://hg.adblockplus.org/subscriptionlist/archive/default.tar.gz --2016-03-08 11:58:41-- https://hg.adblockplus.org/subscriptionlist/archive/default.tar.gz Resolving hg.adblockplus.org (hg.adblockplus.org)... 46.4.107.141, 2a01:4f8:141:406::2 Connecting to hg.adblockplus.org (hg.adblockplus.org)|46.4.107.141|:443... connected. HTTP request sent, awaiting response... 200 Script output follows Length: unspecified [application/x-gzip] Saving to: ‘default.tar.gz’ [ <=> ] 35,874 35.0KB/s in 1.0s 2016-03-08 11:59:14 (35.0 KB/s) - ‘default.tar.gz’ saved [35874] real 0m33.158s user 0m0.016s sys 0m0.000s
$ time wget https://hg.adblockplus.org/subscriptionlist/rawfile/default/settings --2016-03-08 11:59:37-- https://hg.adblockplus.org/subscriptionlist/rawfile/default/settings Resolving hg.adblockplus.org (hg.adblockplus.org)... 46.4.107.141, 2a01:4f8:141:406::2 Connecting to hg.adblockplus.org (hg.adblockplus.org)|46.4.107.141|:443... connected. HTTP request sent, awaiting response... 200 Script output follows Length: 549 [application/binary] Saving to: ‘settings’ 100%[===============================================================================================================================>] 549 --.-K/s in 0s 2016-03-08 11:59:42 (22.6 MB/s) - ‘settings’ saved [549/549] real 0m4.990s user 0m0.012s sys 0m0.000s
The performance of cms.bin.generate_static_pages is unchanged though.
comment:3 Changed on 03/08/2016 at 03:16:15 PM by kzar
- Component changed from Unknown to Websites
- Owner set to kzar
I can also reproduce this and agree that the get_subscriptions filter is the culprit. The problem appears to be that the filter is called once for each locale, and so those two files are downloaded over and over again.
comment:4 Changed on 03/08/2016 at 03:28:44 PM by kzar
- Review URL(s) modified (diff)
- Status changed from new to reviewing
comment:5 Changed on 03/08/2016 at 03:45:26 PM by kzar
- Summary changed from Fix CMS update performance to Improve performance of web.adblockplus.org's get_subscriptions filter
comment:6 Changed on 03/08/2016 at 03:54:56 PM by kzar
- Description modified (diff)
- Ready set
- Type changed from defect to change
comment:7 Changed on 03/08/2016 at 03:56:09 PM by kzar
- Summary changed from Improve performance of web.adblockplus.org's get_subscriptions filter to Cache result of web.adblockplus.org's get_subscriptions filter
comment:8 Changed on 03/09/2016 at 04:24:18 PM by abpbot
A commit referencing this issue has landed:
https://hg.adblockplus.org/web.adblockplus.org/rev/067d8736274c
comment:9 Changed on 03/09/2016 at 04:26:29 PM by kzar
- Resolution set to fixed
- Status changed from reviewing to closed
I can confirm that same happens when I trying to generate the content locally, I have assumption that it's somehow connected to "get_subscriptions" function after removing the call to the function, generation takes like 1 minute.
@Wladimir, @Dave seems like this function was created during the migration, can you please have a look, would it be possible to optimize the function, I'm not really sure what exactly cause the function to work that long, but still investigating the issue (will keep you updated).
I do remember we had that problem even before, not sure if it was 30 minutes, but could also be related to this function.