Opened 4 years ago

Closed 4 years ago

#3115 closed defect (fixed)

hg.adblockplus.org intermittent global 404

Reported by: barbaz Assignee: matze
Priority: P1 Milestone:
Module: Infrastructure Keywords:
Cc: fred, fhd, greiner Blocked By: #2909, #3211
Blocking: Platform: Unknown / Cross platform
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

https://codereview.adblockplus.org/29328647/
https://codereview.adblockplus.org/29328668/
https://codereview.adblockplus.org/29329092/
https://codereview.adblockplus.org/29329312/

Description

Environment

SeaMonkey 2.35, Linux x86_64

How to reproduce

(see https://forums.lanik.us/viewtopic.php?f=88&t=25478 )
access https://hg.adblockplus.org/ at around 04:00 UTC

Observed behaviour

That page, as well as any other page on hg.adblockplus.org, returns a 404. Accessing it hours earlier, it works.

Expected behaviour

Displays a listing of Mercurial repositories

Change History (12)

comment:1 Changed 4 years ago by mapx

  • Cc matze added
  • Component changed from Unknown to Infrastructure

comment:2 Changed 4 years ago by matze

  • Cc fred fhd added; matze removed
  • Owner set to matze
  • Priority changed from Unknown to P1
  • Ready set

comment:3 Changed 4 years ago by barbaz

Been checking up on it in sort-of regular intervals, to get a better sense of what time this starts. I'll add comments as I find out more.

I see this issue already 4 minutes before posting this comment...

EDIT Oh, and it was *not* happening 4 hours ago.

Last edited 4 years ago by barbaz (previous) (diff)

comment:4 Changed 4 years ago by matze

The symptoms are actually caused by two distinct issues:

  1. The memory consumption of the hgweb.fgi script has increased dramatically, causing it to break down at various occasions. That one has actually been improved on earlier this week already, yet it has not been completely resolved yet.
  2. The new Nginx package in use does not provide all the default error pages included with the one we've used before, thus the failing backend results in a 404 NOT FOUND error, which actually refers to the error page for the 500 INTERNAL SERVER ERROR it intends to report due to the failing backend.

The source for the first issue has not been found yet.

comment:5 Changed 4 years ago by matze

  • Blocked By 2909 added
  • Review URL(s) modified (diff)
  • Status changed from new to reviewing

It seems like 1. is caused by the various log views (esp. /*/atom-log, /*/rss-log and the HTML changelog interfaces at /*/log), which have changed in implementation between the Mercurial/hgweb versions 2.0.2 (used before #273) and 3.5.

Our current approach for tackling this issue (aside from 2.):

  • Apply explicit caching for those resources via Nginx; currently in review
  • Upgrade to the most recent bugfix release of Mercurial; currently 3.5.1 is in testing
  • Test whether a forking WSGI/FCGI server can improve scaling here

Also, this again has shown an increasing need for an integration of proper throttling mechanisms, aligned with e.g. #2487, but especially for HTTP/HTTPS.

comment:6 Changed 4 years ago by matze

  • Cc greiner added
  • Review URL(s) modified (diff)

comment:7 Changed 4 years ago by matze

Caching (1.) is active for both RSS and ATOM logs now. The 50X error page (2.) has been fixed in that roll-out as well. While this will not completely fix the issue, it should at least increase the stability significantly. Our tests confirm that assumption so far.

comment:8 Changed 4 years ago by matze

  • Review URL(s) modified (diff)

The hgweb.fcgi file imported from the legacy server during #2906 contains a patch to work around a possible race condition. This one is obsolete by now, and actually decreases the stability of the service.

comment:9 Changed 4 years ago by matze

The aforementioned workaround has been removed.

comment:10 Changed 4 years ago by matze

  • Blocked By 3211 added

comment:11 Changed 4 years ago by matze

  • Review URL(s) modified (diff)

comment:12 Changed 4 years ago by matze

  • Resolution set to fixed
  • Status changed from reviewing to closed
Note: See TracTickets for help on using tickets.