Opened on 09/10/2015 at 04:36:02 PM
Closed on 09/16/2015 at 06:32:11 AM
Last modified on 09/16/2015 at 09:34:16 AM
#3056 closed defect (fixed)
[CMS] cms.bin.test_server intermittently pauses for several seconds when serving static files
Reported by: | kzar | Assignee: | sebastian |
---|---|---|---|
Priority: | P4 | Milestone: | |
Module: | Sitescripts | Keywords: | cms |
Cc: | trev, Ross, sebastian, saroyanm | Blocked By: | |
Blocking: | Platform: | Unknown / Cross platform | |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by kzar)
Environment
- Latest revision of CMS (f9fd64fbc629)
- Ubuntu Linux 15.04 running on a Thinkpad T450
- Serving the contents of the cms-problem-example repository.
- Python version 2.7.9
- Werkzeug 0.10.4 (But the problem still occurs when using wsgiref.simple_server.make_server instead of Werkzeug.)
How to reproduce
- Clone https://github.com/kzar/cms-problem-example and the cms.
- In the cms-problem-example directory launch the test server like ../cms/runserver.py.
- Browse to this exact URL http://localhost:5000/
- Keep refreshing the page until the delay problem occurs.
Observed behaviour
There are seemingly random large pauses of several seconds when the server is serving static files like images and scripts. With the example above it happens more than 50% of the time for me.
Expected behaviour
The files should be consistently served quickly.
Notes
I have also noticed a very similar / the same problem with the run_tests.py script in the adblockpluscore repository. The script acts as web server serving the pages and static files required to run the unit tests. The cause of this issue could be related.
The problem appears to happen when a connection is opened and then closed with no data sent. The request following will then be blocked for 10 seconds which is presumably a time-out somewhere in the web-server.
Attachments (0)
Change History (18)
comment:1 Changed on 09/10/2015 at 04:37:55 PM by kzar
- Cc saroyanm added
comment:3 Changed on 09/10/2015 at 05:32:44 PM by sebastian
- Cc sebastian added; snoack removed
comment:6 Changed on 09/10/2015 at 08:13:36 PM by Ross
This also occurred before the generic hide tests were added, but they do seem to cause it more.
comment:7 Changed on 09/10/2015 at 10:03:15 PM by kzar
Passing threaded=True through to werkzeug.serving.run_simple in cms/bin/test_server.py seems to work around the problem. I still don't understand what's happening however.
comment:9 Changed on 09/11/2015 at 11:12:29 AM by sebastian
- Priority changed from Unknown to P4
- Ready set
Reproduced.
It seems that this happening when the browser closes the connection. Here is the strace output:
20:37:47 recvfrom(4, "", 8192, 0, NULL, NULL) = 0 20:37:57 shutdown(4, 1 /* send */) = 0 20:37:57 close(4) = 0 20:37:57 select(4, [3], [], [], {0, 500000}) = 1 (in [3], left {0, 499995}) 20:37:57 accept(3, {sa_family=AF_INET, sin_port=htons(36510), sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
Always when recvfrom returns zero bytes, there is a delay of 10 seconds until the following shutdown call returns. When following the steps from this issue's description, this happens quite a lot. But when serving web.eyeo.com for example with cms/runserver.py, recvfrom never returns zero bytes.
comment:10 Changed on 09/11/2015 at 11:43:47 AM by kzar
I can also reproduce this when using wsgiref.simple_server.make_server instead of Werkzeug. It's possibly related to this issue http://bugs.python.org/issue21878 ?
comment:11 Changed on 09/11/2015 at 12:21:49 PM by kzar
- Description modified (diff)
I've updated the description with a much simpler way to reproduce this problem.
comment:12 Changed on 09/11/2015 at 12:24:23 PM by kzar
- Description modified (diff)
comment:13 Changed on 09/14/2015 at 03:31:03 PM by sebastian
I got the timestamp in the strace log wrong. It's the recvfrom, not the shutdown call, that blocks for 10 seconds. That happens when the browser opens a second connection to the server. The server then waits for another request on the first connection. After 10 seconds the browser closes the first connection. And the server responds to the request on the second connection. This is reproducible in the scenario here, because content is served from two different domains, forcing the browser to open one connection per domain.
Enabling multi-threading seems to be the appropriate solution here. That way the server can handle multiple connections at the same time.
Alternatively, we could add the Connection: close header to force the browser to close the connection after every request. But Chrome seems to ignore the Connection: close header, leaving the connection open until its closed by the server or until there are no more requests to send. However, disabling keep-alive would be another difference to the production server that we should avoid anyway.
comment:14 Changed on 09/15/2015 at 06:30:57 AM by sebastian
- Owner set to sebastian
comment:15 Changed on 09/15/2015 at 06:32:34 AM by sebastian
- Review URL(s) modified (diff)
- Status changed from new to reviewing
comment:16 Changed on 09/16/2015 at 06:32:11 AM by sebastian
- Resolution set to fixed
- Status changed from reviewing to closed
comment:17 Changed on 09/16/2015 at 06:35:26 AM by sebastian
@kzar: As you experienced the issue as well with run_tests.py from adblockpluscore, mind filing a follow up issue to fix it there as well?
comment:18 Changed on 09/16/2015 at 09:19:04 AM by kzar
Sure, I've done that - #3081.
@saroyanm Have you observed this behaviour as well? So far me and Ross have.