Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#3056 closed defect (fixed)

[CMS] cms.bin.test_server intermittently pauses for several seconds when serving static files

Reported by: kzar Assignee: sebastian
Priority: P4 Milestone:
Module: Sitescripts Keywords: cms
Cc: trev, Ross, sebastian, saroyanm Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29327611

Description (last modified by kzar)

Environment

  • Latest revision of CMS (f9fd64fbc629)
  • Ubuntu Linux 15.04 running on a Thinkpad T450
  • Serving the contents of the cms-problem-example repository.
  • Python version 2.7.9
  • Werkzeug 0.10.4 (But the problem still occurs when using wsgiref.simple_server.make_server instead of Werkzeug.)

How to reproduce

  1. Clone https://github.com/kzar/cms-problem-example and the cms.
  2. In the cms-problem-example directory launch the test server like ../cms/runserver.py.
  3. Browse to this exact URL http://localhost:5000/
  4. Keep refreshing the page until the delay problem occurs.

Observed behaviour

There are seemingly random large pauses of several seconds when the server is serving static files like images and scripts. With the example above it happens more than 50% of the time for me.

Expected behaviour

The files should be consistently served quickly.

Notes

I have also noticed a very similar / the same problem with the run_tests.py script in the adblockpluscore repository. The script acts as web server serving the pages and static files required to run the unit tests. The cause of this issue could be related.

The problem appears to happen when a connection is opened and then closed with no data sent. The request following will then be blocked for 10 seconds which is presumably a time-out somewhere in the web-server.

Change History (18)

comment:1 Changed 4 years ago by kzar

  • Cc saroyanm added

@saroyanm Have you observed this behaviour as well? So far me and Ross have.

comment:2 Changed 4 years ago by kzar

  • Description modified (diff)

comment:3 Changed 4 years ago by sebastian

  • Cc sebastian added; snoack removed

comment:4 Changed 4 years ago by kzar

  • Description modified (diff)

comment:5 Changed 4 years ago by kzar

  • Description modified (diff)

comment:6 Changed 4 years ago by Ross

This also occurred before the generic hide tests were added, but they do seem to cause it more.

comment:7 Changed 4 years ago by kzar

Passing threaded=True through to werkzeug.serving.run_simple in cms/bin/test_server.py seems to work around the problem. I still don't understand what's happening however.

Last edited 4 years ago by kzar (previous) (diff)

comment:8 Changed 4 years ago by kzar

  • Description modified (diff)

comment:9 Changed 4 years ago by sebastian

  • Priority changed from Unknown to P4
  • Ready set

Reproduced.

It seems that this happening when the browser closes the connection. Here is the strace output:

20:37:47 recvfrom(4, "", 8192, 0, NULL, NULL) = 0
20:37:57 shutdown(4, 1 /* send */)      = 0
20:37:57 close(4)                       = 0
20:37:57 select(4, [3], [], [], {0, 500000}) = 1 (in [3], left {0, 499995})
20:37:57 accept(3, {sa_family=AF_INET, sin_port=htons(36510), sin_addr=inet_addr("127.0.0.1")}, [16]) = 4

Always when recvfrom returns zero bytes, there is a delay of 10 seconds until the following shutdown call returns. When following the steps from this issue's description, this happens quite a lot. But when serving web.eyeo.com for example with cms/runserver.py, recvfrom never returns zero bytes.

comment:10 Changed 4 years ago by kzar

I can also reproduce this when using wsgiref.simple_server.make_server instead of Werkzeug. It's possibly related to this issue http://bugs.python.org/issue21878 ?

comment:11 Changed 4 years ago by kzar

  • Description modified (diff)

I've updated the description with a much simpler way to reproduce this problem.

comment:12 Changed 4 years ago by kzar

  • Description modified (diff)

comment:13 Changed 4 years ago by sebastian

I got the timestamp in the strace log wrong. It's the recvfrom, not the shutdown call, that blocks for 10 seconds. That happens when the browser opens a second connection to the server. The server then waits for another request on the first connection. After 10 seconds the browser closes the first connection. And the server responds to the request on the second connection. This is reproducible in the scenario here, because content is served from two different domains, forcing the browser to open one connection per domain.

Enabling multi-threading seems to be the appropriate solution here. That way the server can handle multiple connections at the same time.

Alternatively, we could add the Connection: close header to force the browser to close the connection after every request. But Chrome seems to ignore the Connection: close header, leaving the connection open until its closed by the server or until there are no more requests to send. However, disabling keep-alive would be another difference to the production server that we should avoid anyway.

Last edited 4 years ago by sebastian (previous) (diff)

comment:14 Changed 4 years ago by sebastian

  • Owner set to sebastian

comment:15 Changed 4 years ago by sebastian

  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

comment:16 Changed 4 years ago by sebastian

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:17 Changed 4 years ago by sebastian

@kzar: As you experienced the issue as well with run_tests.py from adblockpluscore, mind filing a follow up issue to fix it there as well?

comment:18 Changed 4 years ago by kzar

Sure, I've done that - #3081.

Note: See TracTickets for help on using tickets.