Opened on 09/23/2015 at 09:00:07 PM

Closed on 11/02/2015 at 11:37:17 AM

#3115 closed defect (fixed) intermittent global 404

Reported by: barbaz Assignee: matze
Priority: P1 Milestone:
Module: Infrastructure Keywords:
Cc: fred, fhd, greiner Blocked By: #2909, #3211
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):



SeaMonkey 2.35, Linux x86_64

How to reproduce

(see )
access at around 04:00 UTC

Observed behaviour

That page, as well as any other page on, returns a 404. Accessing it hours earlier, it works.

Expected behaviour

Displays a listing of Mercurial repositories

Attachments (0)

Change History (12)

comment:1 Changed on 09/23/2015 at 09:19:25 PM by mapx

  • Cc matze added
  • Component changed from Unknown to Infrastructure

comment:2 Changed on 09/23/2015 at 09:58:29 PM by matze

  • Cc fred fhd added; matze removed
  • Owner set to matze
  • Priority changed from Unknown to P1
  • Ready set

comment:3 Changed on 09/24/2015 at 02:21:55 AM by barbaz

Been checking up on it in sort-of regular intervals, to get a better sense of what time this starts. I'll add comments as I find out more.

I see this issue already 4 minutes before posting this comment...

EDIT Oh, and it was *not* happening 4 hours ago.

Last edited on 09/24/2015 at 02:24:45 AM by barbaz

comment:4 Changed on 09/24/2015 at 05:55:14 AM by matze

The symptoms are actually caused by two distinct issues:

  1. The memory consumption of the hgweb.fgi script has increased dramatically, causing it to break down at various occasions. That one has actually been improved on earlier this week already, yet it has not been completely resolved yet.
  2. The new Nginx package in use does not provide all the default error pages included with the one we've used before, thus the failing backend results in a 404 NOT FOUND error, which actually refers to the error page for the 500 INTERNAL SERVER ERROR it intends to report due to the failing backend.

The source for the first issue has not been found yet.

comment:5 Changed on 09/25/2015 at 02:29:26 AM by matze

  • Blocked By 2909 added
  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

It seems like 1. is caused by the various log views (esp. /*/atom-log, /*/rss-log and the HTML changelog interfaces at /*/log), which have changed in implementation between the Mercurial/hgweb versions 2.0.2 (used before #273) and 3.5.

Our current approach for tackling this issue (aside from 2.):

  • Apply explicit caching for those resources via Nginx; currently in review
  • Upgrade to the most recent bugfix release of Mercurial; currently 3.5.1 is in testing
  • Test whether a forking WSGI/FCGI server can improve scaling here

Also, this again has shown an increasing need for an integration of proper throttling mechanisms, aligned with e.g. #2487, but especially for HTTP/HTTPS.

comment:6 Changed on 09/25/2015 at 01:56:35 PM by matze

  • Cc greiner added
  • Review URL(s) modified (diff)

comment:7 Changed on 10/01/2015 at 12:47:10 PM by matze

Caching (1.) is active for both RSS and ATOM logs now. The 50X error page (2.) has been fixed in that roll-out as well. While this will not completely fix the issue, it should at least increase the stability significantly. Our tests confirm that assumption so far.

comment:8 Changed on 10/13/2015 at 12:00:12 PM by matze

  • Review URL(s) modified (diff)

The hgweb.fcgi file imported from the legacy server during #2906 contains a patch to work around a possible race condition. This one is obsolete by now, and actually decreases the stability of the service.

comment:9 Changed on 10/13/2015 at 04:49:49 PM by matze

The aforementioned workaround has been removed.

comment:10 Changed on 10/19/2015 at 02:14:58 PM by matze

  • Blocked By 3211 added

comment:11 Changed on 10/20/2015 at 11:08:41 PM by matze

  • Review URL(s) modified (diff)

comment:12 Changed on 11/02/2015 at 11:37:17 AM by matze

  • Resolution set to fixed
  • Status changed from reviewing to closed

Add Comment

Modify Ticket

Change Properties
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from matze.
Note: See TracTickets for help on using tickets.