Opened on 08/30/2017 at 05:57:40 PM
Closed on 09/12/2019 at 12:11:25 PM
#5601 closed defect (fixed)
CMS takes too long to generate web.adblockplus.org
Reported by: | juliandoucette | Assignee: | kvas |
---|---|---|---|
Priority: | P3 | Milestone: | |
Module: | Sitescripts | Keywords: | |
Cc: | wspee, kvas, jsonesen | Blocked By: | |
Blocking: | Platform: | Unknown / Cross platform | |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description
Environment
Macbook Pro
How to reproduce
- clone web.adblockplus.org
- run cms.bin.generate_static_pages
Observed behaviour
This script takes a long time to run.
Expected behaviour
This script should not take a long time to run.
Attachments (2)
Change History (12)
comment:1 Changed on 08/30/2017 at 05:58:11 PM by juliandoucette
comment:2 Changed on 08/31/2017 at 08:57:25 AM by kvas
I agree that it takes too long. I wonder if it's the CMS or the global functions of web.adblockplus.org (that download stuff from the internet) that make it slow. Have you looked into it?
comment:3 Changed on 08/31/2017 at 09:35:52 AM by juliandoucette
Have you looked into it?
No.
comment:4 Changed on 08/31/2017 at 09:51:36 AM by jsonesen
I haven't looked into this either but I have noticed it is rather slow. I reckon it is the network requests holding up the execution since as far as operations go they tend to be the slowest, then if you have lots of them running sequentially it adds up. There is also the mercurial calls.
Changed on 08/31/2017 at 02:38:47 PM by kvas
Profile: generate_static_pages()
Changed on 08/31/2017 at 02:50:35 PM by kvas
Profile: converters.py: call
comment:5 Changed on 08/31/2017 at 03:05:30 PM by kvas
I've spent some hours profiling the CMS today. First of all, networks requests are not the main reason for slow rendering in case of www.adblockplus.org. Mocking them out brings the time down to 1m50s (from 1m54s), which is fairly small. So what can we do?
First easy option is running the rendering job with PyPy (I used version 5.8.0) instead of regular Python. It finishes in 1m20s, which is about 30% faster. This is still fairly slow, so I've run the rendering under a profiler and started looking at what took most of the time. Here's the visualisation of the top level:
Here we can see that the time is split between utils.py:process_pageand utils.py:get_page_params. process_page also calls get_page_params inside of it and each of them calls converters.py:Converter:__call__, which is where most of the time is spent. What's going on here?
What happens is that CMS calls Converter:__call__ twice for each page-locale combination: first time to determine if there's enough translation material for it and then again to render it (the second call doesn't happen if the first one determined that the page is not translated enough). In addition to that, the second pass is also followed by Converter:__call__ for the template of the page. As we can see from the picture above, most of the time is spent in Converter:__call__. So what's going on inside of it?
Of the time spent inside __call__ about 20% is spent inside of resolve_includes (purple block), which is mostly regexp substitution with some functions called inside. The rest is get_html, which is different for different converters. The biggest chunk of time is get_html of TemplateConverter (>50%), which mostly compiles Jinja templates. The remaining blocks are get_html of RawConverter and MarkdownConverter, which mostly spend time on localization.
From this investigation it seems like significant optimization is possible by avoiding recompilation of templates and possibly caching some other things here and there to avoid recomputing the same values multiple times. This is not going to be too easy, but right now my estimate is that it's possible to at least double the performance of CMS by spending less than a week on more detailed investigation / coding.
comment:6 Changed on 02/21/2018 at 04:59:03 PM by juliandoucette
Thanks kvas :) - I look forward to it.
comment:7 Changed on 09/25/2018 at 12:35:22 PM by kvas
- Owner set to kvas
- Priority changed from Unknown to P3
- Ready set
- Review URL(s) modified (diff)
comment:8 Changed on 09/25/2018 at 12:35:46 PM by kvas
- Status changed from new to reviewing
comment:9 Changed on 09/28/2018 at 10:56:42 AM by abpbot
A commit referencing this issue has landed:
Issue 5601 - Speed up static generation
comment:10 Changed on 09/12/2019 at 12:11:25 PM by kvas
- Resolution set to fixed
- Status changed from reviewing to closed
I know this opinionated... But I also think it's obvious and should be documented somewhere :D