Opened 21 months ago

Last modified 8 months ago

#5601 reviewing defect

CMS takes too long to generate web.adblockplus.org

Reported by: juliandoucette Assignee: kvas
Priority: P3 Milestone:
Module: Sitescripts Keywords:
Cc: wspee, kvas, jsonesen Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29891645

Description

Environment

Macbook Pro

How to reproduce

  1. clone web.adblockplus.org
  2. run cms.bin.generate_static_pages

Observed behaviour

This script takes a long time to run.

Expected behaviour

This script should not take a long time to run.

Attachments (2)

profile-generate_static_pages.png (53.6 KB) - added by kvas 21 months ago.
Profile: generate_static_pages()
profile-converters-call.png (38.7 KB) - added by kvas 21 months ago.
Profile: converters.py: call

Download all attachments as: .zip

Change History (11)

comment:1 Changed 21 months ago by juliandoucette

I know this opinionated... But I also think it's obvious and should be documented somewhere :D

comment:2 Changed 21 months ago by kvas

I agree that it takes too long. I wonder if it's the CMS or the global functions of web.adblockplus.org (that download stuff from the internet) that make it slow. Have you looked into it?

comment:3 Changed 21 months ago by juliandoucette

Have you looked into it?

No.

comment:4 Changed 21 months ago by jsonesen

I haven't looked into this either but I have noticed it is rather slow. I reckon it is the network requests holding up the execution since as far as operations go they tend to be the slowest, then if you have lots of them running sequentially it adds up. There is also the mercurial calls.

Last edited 21 months ago by jsonesen (previous) (diff)

Changed 21 months ago by kvas

Profile: generate_static_pages()

Changed 21 months ago by kvas

Profile: converters.py: call

comment:5 Changed 21 months ago by kvas

I've spent some hours profiling the CMS today. First of all, networks requests are not the main reason for slow rendering in case of www.adblockplus.org. Mocking them out brings the time down to 1m50s (from 1m54s), which is fairly small. So what can we do?

First easy option is running the rendering job with PyPy (I used version 5.8.0) instead of regular Python. It finishes in 1m20s, which is about 30% faster. This is still fairly slow, so I've run the rendering under a profiler and started looking at what took most of the time. Here's the visualisation of the top level:

Profile: generate_static_pages()

Here we can see that the time is split between utils.py:process_pageand utils.py:get_page_params. process_page also calls get_page_params inside of it and each of them calls converters.py:Converter:__call__, which is where most of the time is spent. What's going on here?

What happens is that CMS calls Converter:__call__ twice for each page-locale combination: first time to determine if there's enough translation material for it and then again to render it (the second call doesn't happen if the first one determined that the page is not translated enough). In addition to that, the second pass is also followed by Converter:__call__ for the template of the page. As we can see from the picture above, most of the time is spent in Converter:__call__. So what's going on inside of it?

Profile: converters.py: __call__

Of the time spent inside __call__ about 20% is spent inside of resolve_includes (purple block), which is mostly regexp substitution with some functions called inside. The rest is get_html, which is different for different converters. The biggest chunk of time is get_html of TemplateConverter (>50%), which mostly compiles Jinja templates. The remaining blocks are get_html of RawConverter and MarkdownConverter, which mostly spend time on localization.

From this investigation it seems like significant optimization is possible by avoiding recompilation of templates and possibly caching some other things here and there to avoid recomputing the same values multiple times. This is not going to be too easy, but right now my estimate is that it's possible to at least double the performance of CMS by spending less than a week on more detailed investigation / coding.

comment:6 Changed 15 months ago by juliandoucette

Thanks kvas :) - I look forward to it.

comment:7 Changed 8 months ago by kvas

  • Owner set to kvas
  • Priority changed from Unknown to P3
  • Ready set
  • Review URL(s) modified (diff)

comment:8 Changed 8 months ago by kvas

  • Status changed from new to reviewing

comment:9 Changed 8 months ago by abpbot

A commit referencing this issue has landed:
Issue 5601 - Speed up static generation

Note: See TracTickets for help on using tickets.