Opened 2 years ago

Last modified 11 months ago

#5601 reviewing defect

CMS takes too long to generate web.adblockplus.org

Reported by: juliandoucette Assignee: kvas
Priority: P3 Milestone:
Module: Sitescripts Keywords:
Cc: wspee, kvas, jsonesen Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29891645

Description

Environment

Macbook Pro

How to reproduce

  1. clone web.adblockplus.org
  2. run cms.bin.generate_static_pages

Observed behaviour

This script takes a long time to run.

Expected behaviour

This script should not take a long time to run.

Attachments (2)

profile-generate_static_pages.png (53.6 KB) - added by kvas 2 years ago.
Profile: generate_static_pages()
profile-converters-call.png (38.7 KB) - added by kvas 2 years ago.
Profile: converters.py: call

Download all attachments as: .zip

Change History (11)

comment:1 Changed 2 years ago by juliandoucette

I know this opinionated... But I also think it's obvious and should be documented somewhere :D

comment:2 Changed 2 years ago by kvas

I agree that it takes too long. I wonder if it's the CMS or the global functions of web.adblockplus.org (that download stuff from the internet) that make it slow. Have you looked into it?

comment:3 Changed 2 years ago by juliandoucette

Have you looked into it?

No.

comment:4 Changed 2 years ago by jsonesen

I haven't looked into this either but I have noticed it is rather slow. I reckon it is the network requests holding up the execution since as far as operations go they tend to be the slowest, then if you have lots of them running sequentially it adds up. There is also the mercurial calls.

Last edited 2 years ago by jsonesen (previous) (diff)

Changed 2 years ago by kvas

Profile: generate_static_pages()

Changed 2 years ago by kvas

Profile: converters.py: call

comment:5 Changed 2 years ago by kvas

I've spent some hours profiling the CMS today. First of all, networks requests are not the main reason for slow rendering in case of www.adblockplus.org. Mocking them out brings the time down to 1m50s (from 1m54s), which is fairly small. So what can we do?

First easy option is running the rendering job with PyPy (I used version 5.8.0) instead of regular Python. It finishes in 1m20s, which is about 30% faster. This is still fairly slow, so I've run the rendering under a profiler and started looking at what took most of the time. Here's the visualisation of the top level:

Profile: generate_static_pages()

Here we can see that the time is split between utils.py:process_pageand utils.py:get_page_params. process_page also calls get_page_params inside of it and each of them calls converters.py:Converter:__call__, which is where most of the time is spent. What's going on here?

What happens is that CMS calls Converter:__call__ twice for each page-locale combination: first time to determine if there's enough translation material for it and then again to render it (the second call doesn't happen if the first one determined that the page is not translated enough). In addition to that, the second pass is also followed by Converter:__call__ for the template of the page. As we can see from the picture above, most of the time is spent in Converter:__call__. So what's going on inside of it?

Profile: converters.py: __call__

Of the time spent inside __call__ about 20% is spent inside of resolve_includes (purple block), which is mostly regexp substitution with some functions called inside. The rest is get_html, which is different for different converters. The biggest chunk of time is get_html of TemplateConverter (>50%), which mostly compiles Jinja templates. The remaining blocks are get_html of RawConverter and MarkdownConverter, which mostly spend time on localization.

From this investigation it seems like significant optimization is possible by avoiding recompilation of templates and possibly caching some other things here and there to avoid recomputing the same values multiple times. This is not going to be too easy, but right now my estimate is that it's possible to at least double the performance of CMS by spending less than a week on more detailed investigation / coding.

comment:6 Changed 18 months ago by juliandoucette

Thanks kvas :) - I look forward to it.

comment:7 Changed 11 months ago by kvas

  • Owner set to kvas
  • Priority changed from Unknown to P3
  • Ready set
  • Review URL(s) modified (diff)

comment:8 Changed 11 months ago by kvas

  • Status changed from new to reviewing

comment:9 Changed 11 months ago by abpbot

A commit referencing this issue has landed:
Issue 5601 - Speed up static generation

Note: See TracTickets for help on using tickets.