Opened on 08/30/2017 at 05:57:40 PM

Closed on 09/12/2019 at 12:11:25 PM

#5601 closed defect (fixed)

CMS takes too long to generate web.adblockplus.org

Reported by: juliandoucette Assignee: kvas
Priority: P3 Milestone:
Module: Sitescripts Keywords:
Cc: wspee, kvas, jsonesen Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29891645

Description

Environment

Macbook Pro

How to reproduce

  1. clone web.adblockplus.org
  2. run cms.bin.generate_static_pages

Observed behaviour

This script takes a long time to run.

Expected behaviour

This script should not take a long time to run.

Attachments (2)

profile-generate_static_pages.png (53.6 KB) - added by kvas on 08/31/2017 at 02:38:47 PM.
Profile: generate_static_pages()
profile-converters-call.png (38.7 KB) - added by kvas on 08/31/2017 at 02:50:35 PM.
Profile: converters.py: call

Download all attachments as: .zip

Change History (12)

comment:1 Changed on 08/30/2017 at 05:58:11 PM by juliandoucette

I know this opinionated... But I also think it's obvious and should be documented somewhere :D

comment:2 Changed on 08/31/2017 at 08:57:25 AM by kvas

I agree that it takes too long. I wonder if it's the CMS or the global functions of web.adblockplus.org (that download stuff from the internet) that make it slow. Have you looked into it?

comment:3 Changed on 08/31/2017 at 09:35:52 AM by juliandoucette

Have you looked into it?

No.

comment:4 Changed on 08/31/2017 at 09:51:36 AM by jsonesen

I haven't looked into this either but I have noticed it is rather slow. I reckon it is the network requests holding up the execution since as far as operations go they tend to be the slowest, then if you have lots of them running sequentially it adds up. There is also the mercurial calls.

Last edited on 08/31/2017 at 10:41:57 AM by jsonesen

Changed on 08/31/2017 at 02:38:47 PM by kvas

Profile: generate_static_pages()

Changed on 08/31/2017 at 02:50:35 PM by kvas

Profile: converters.py: call

comment:5 Changed on 08/31/2017 at 03:05:30 PM by kvas

I've spent some hours profiling the CMS today. First of all, networks requests are not the main reason for slow rendering in case of www.adblockplus.org. Mocking them out brings the time down to 1m50s (from 1m54s), which is fairly small. So what can we do?

First easy option is running the rendering job with PyPy (I used version 5.8.0) instead of regular Python. It finishes in 1m20s, which is about 30% faster. This is still fairly slow, so I've run the rendering under a profiler and started looking at what took most of the time. Here's the visualisation of the top level:

Profile: generate_static_pages()

Here we can see that the time is split between utils.py:process_pageand utils.py:get_page_params. process_page also calls get_page_params inside of it and each of them calls converters.py:Converter:__call__, which is where most of the time is spent. What's going on here?

What happens is that CMS calls Converter:__call__ twice for each page-locale combination: first time to determine if there's enough translation material for it and then again to render it (the second call doesn't happen if the first one determined that the page is not translated enough). In addition to that, the second pass is also followed by Converter:__call__ for the template of the page. As we can see from the picture above, most of the time is spent in Converter:__call__. So what's going on inside of it?

Profile: converters.py: __call__

Of the time spent inside __call__ about 20% is spent inside of resolve_includes (purple block), which is mostly regexp substitution with some functions called inside. The rest is get_html, which is different for different converters. The biggest chunk of time is get_html of TemplateConverter (>50%), which mostly compiles Jinja templates. The remaining blocks are get_html of RawConverter and MarkdownConverter, which mostly spend time on localization.

From this investigation it seems like significant optimization is possible by avoiding recompilation of templates and possibly caching some other things here and there to avoid recomputing the same values multiple times. This is not going to be too easy, but right now my estimate is that it's possible to at least double the performance of CMS by spending less than a week on more detailed investigation / coding.

comment:6 Changed on 02/21/2018 at 04:59:03 PM by juliandoucette

Thanks kvas :) - I look forward to it.

comment:7 Changed on 09/25/2018 at 12:35:22 PM by kvas

  • Owner set to kvas
  • Priority changed from Unknown to P3
  • Ready set
  • Review URL(s) modified (diff)

comment:8 Changed on 09/25/2018 at 12:35:46 PM by kvas

  • Status changed from new to reviewing

comment:9 Changed on 09/28/2018 at 10:56:42 AM by abpbot

A commit referencing this issue has landed:
Issue 5601 - Speed up static generation

comment:10 Changed on 09/12/2019 at 12:11:25 PM by kvas

  • Resolution set to fixed
  • Status changed from reviewing to closed

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from kvas.
 
Note: See TracTickets for help on using tickets.