Opened 4 years ago

Closed 4 years ago

#3750 closed change (fixed)

Cache result of web.adblockplus.org's get_subscriptions filter

Reported by: matze Assignee: kzar
Priority: P1 Milestone:
Module: Websites Keywords:
Cc: saroyanm, juliandoucette, fhd, fred, kvas, kzar, trev Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29337941/

Description (last modified by kzar)

Generation of web.adblockplus.org is extremely slow when the subscriptions list isn't skipped. To generate the subscription list the get_subscriptions filter is called and that in turn fetches and processes some files to generate the list.

It turns out that the get_subscriptions filter takes some time to run and is also called once per locale. To improve performance we should cache its result the first time, as to avoid repeatedly fetching and processing the same files.

Note: The performance problem here is so severe that Matze has had to suspend the cron-job to automatically regenerate the website, hence P1.

Change History (9)

comment:1 Changed 4 years ago by saroyanm

  • Cc kzar trev added

I can confirm that same happens when I trying to generate the content locally, I have assumption that it's somehow connected to "get_subscriptions" function after removing the call to the function, generation takes like 1 minute.
@Wladimir, @Dave seems like this function was created during the migration, can you please have a look, would it be possible to optimize the function, I'm not really sure what exactly cause the function to work that long, but still investigating the issue (will keep you updated).
I do remember we had that problem even before, not sure if it was 30 minutes, but could also be related to this function.

comment:2 Changed 4 years ago by matze

With that function accessing two resources at https://hg.adblockplus.org/ I thought the bottleneck may be easy to find, but neither of those downloads takes long enough to explain the issue:

$ time wget https://hg.adblockplus.org/subscriptionlist/archive/default.tar.gz
--2016-03-08 11:58:41--  https://hg.adblockplus.org/subscriptionlist/archive/default.tar.gz
Resolving hg.adblockplus.org (hg.adblockplus.org)... 46.4.107.141, 2a01:4f8:141:406::2
Connecting to hg.adblockplus.org (hg.adblockplus.org)|46.4.107.141|:443... connected.
HTTP request sent, awaiting response... 200 Script output follows
Length: unspecified [application/x-gzip]
Saving to: ‘default.tar.gz’

    [  <=>                                                                                                                           ] 35,874      35.0KB/s   in 1.0s

2016-03-08 11:59:14 (35.0 KB/s) - ‘default.tar.gz’ saved [35874]


real    0m33.158s
user    0m0.016s
sys     0m0.000s
$ time wget https://hg.adblockplus.org/subscriptionlist/rawfile/default/settings
--2016-03-08 11:59:37--  https://hg.adblockplus.org/subscriptionlist/rawfile/default/settings
Resolving hg.adblockplus.org (hg.adblockplus.org)... 46.4.107.141, 2a01:4f8:141:406::2
Connecting to hg.adblockplus.org (hg.adblockplus.org)|46.4.107.141|:443... connected.
HTTP request sent, awaiting response... 200 Script output follows
Length: 549 [application/binary]
Saving to: ‘settings’

100%[===============================================================================================================================>] 549         --.-K/s   in 0s

2016-03-08 11:59:42 (22.6 MB/s) - ‘settings’ saved [549/549]


real    0m4.990s
user    0m0.012s
sys     0m0.000s

The performance of cms.bin.generate_static_pages is unchanged though.

comment:3 Changed 4 years ago by kzar

  • Component changed from Unknown to Websites
  • Owner set to kzar

I can also reproduce this and agree that the get_subscriptions filter is the culprit. The problem appears to be that the filter is called once for each locale, and so those two files are downloaded over and over again.

comment:4 Changed 4 years ago by kzar

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:5 Changed 4 years ago by kzar

  • Summary changed from Fix CMS update performance to Improve performance of web.adblockplus.org's get_subscriptions filter

comment:6 Changed 4 years ago by kzar

  • Description modified (diff)
  • Ready set
  • Type changed from defect to change

comment:7 Changed 4 years ago by kzar

  • Summary changed from Improve performance of web.adblockplus.org's get_subscriptions filter to Cache result of web.adblockplus.org's get_subscriptions filter

comment:8 Changed 4 years ago by abpbot

A commit referencing this issue has landed:
https://hg.adblockplus.org/web.adblockplus.org/rev/067d8736274c

comment:9 Changed 4 years ago by kzar

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.