Opened on 07/10/2014 at 09:34:30 AM

Closed on 07/12/2014 at 08:08:33 PM

Last modified on 07/15/2014 at 05:06:21 PM

#766 closed defect (fixed)

Nagios Gateway Timeout

Reported by: matze Assignee: matze
Priority: Unknown Milestone:
Module: Infrastructure Keywords:
Cc: Blocked By:
Blocking: #760 Platform: Unknown
Ready: no Confidential: no
Tester: Verified working: no
Review URL(s):

http://codereview.adblockplus.org/4910365653598208

Description

Even after the fixes made in the context of #765; Nagios / Monitoring setup is still not working properly - at least in the development environment one receives 502 error codes via HTTP.

How to reproduce

Create a new instance of the server4 box in a development environment:

mhennig@kali:~/AdBlockPlus/infrastructure$ ( vagrant destroy -f server4; vagrant up server4 ) 2>&1 | tee .protocol | grep -i 'f\(ast\)cgi\|spawn'
notice: /Stage[main]/Spawn-fcgi/Package[spawn-fcgi]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Spawn-fcgi/File[/etc/spawn-fcgi]/ensure: created
notice: /Stage[main]/Spawn-fcgi/File[/etc/init.d/spawn-fcgi]/ensure: created
notice: /Stage[main]/Nagios::Server/Spawn-fcgi::Php-pool[global]/Spawn-fcgi::Pool[global]/File[/etc/spawn-fcgi/500-global]/ensure: created
notice: /Stage[main]/Spawn-fcgi/Service[spawn-fcgi]/enable: enable changed 'false' to 'true'
notice: /Stage[main]/Spawn-fcgi/Service[spawn-fcgi]: Triggered 'refresh' from 1 events

Attempt to access the Nagios UI:

mhennig@kali:~/AdBlockPlus/infrastructure$ wget -O/dev/null --no-check-certificate https://nagiosadmin:nagiosadmin@10.8.0.99/
--2014-07-10 11:09:54--  https://nagiosadmin:*password*@10.8.0.99/
Connecting to 10.8.0.99:443... connected.
WARNING: The certificate of `10.8.0.99' is not trusted.
WARNING: The certificate of `10.8.0.99' hasn't got a known issuer.
The certificate's owner does not match hostname `10.8.0.99'
HTTP request sent, awaiting response... 401 Unauthorized
Reusing existing connection to 10.8.0.99:443.
HTTP request sent, awaiting response... 502 Bad Gateway
2014-07-10 11:09:54 ERROR 502: Bad Gateway.

At this point there's also no trace of any FCGI processes at all:

mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant ssh server4 -- sudo lsof /tmp/php-fastcgi.sock
mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant ssh server4 -- ps aux | grep -i 'php\|f\(ast\)cgi'
mhennig@kali:~/AdBlockPlus/infrastructure$ 

Even an additional provisioning run does not change the behavior:

mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant provision server4 2>&1 | tee .protocol2 | grep -i 'f\(ast\)cgi\|spawn'
mhennig@kali:~/AdBlockPlus/infrastructure$ wget -O/dev/null --no-check-certificate https://nagiosadmin:nagiosadmin@10.8.0.99/
--2014-07-10 11:30:59--  https://nagiosadmin:*password*@10.8.0.99/
Connecting to 10.8.0.99:443... connected.
WARNING: The certificate of `10.8.0.99' is not trusted.
WARNING: The certificate of `10.8.0.99' hasn't got a known issuer.
The certificate's owner does not match hostname `10.8.0.99'
HTTP request sent, awaiting response... 401 Unauthorized
Reusing existing connection to 10.8.0.99:443.
HTTP request sent, awaiting response... 502 Bad Gateway
2014-07-10 11:30:59 ERROR 502: Bad Gateway.

mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant ssh server4 -- ps aux | grep -i 'php\|f\(ast\)cgi'
mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant ssh server4 -- sudo lsof /tmp/php-fastcgi.sock
mhennig@kali:~/AdBlockPlus/infrastructure$ 

Observed behaviour

The PHP processes are either not spawning at all or dying immediately. (Probably the latter, since the UNIX domain socket gets created.)

Expected behaviour

The processes are spawned, either at provision- (and from then on boot-) time, or by nginx on-demand.

Attachments (2)

protocol-initial-provision.txt (21.2 KB) - added by matze on 07/10/2014 at 09:52:08 AM.
protocol-secondary-provision.txt (825 bytes) - added by matze on 07/10/2014 at 09:53:34 AM.
2nd

Download all attachments as: .zip

Change History (12)

comment:1 Changed on 07/10/2014 at 09:48:27 AM by matze

  • Blocking 760 added

Changed on 07/10/2014 at 09:52:08 AM by matze

Changed on 07/10/2014 at 09:53:34 AM by matze

2nd

comment:2 Changed on 07/11/2014 at 04:13:04 PM by matze

  • Owner set to matze

comment:3 Changed on 07/11/2014 at 04:14:23 PM by matze

Seems like yet another issue caused by a missing, explicit order and thus resulting in randomly missing dependencies. Especially because after a box restart everything works as expected.

comment:4 Changed on 07/11/2014 at 04:33:45 PM by matze

  • Review URL(s) modified (diff)

comment:5 Changed on 07/11/2014 at 04:35:42 PM by matze

Strike; it was caused by the php-cgi package being installed after spawn-fcgi has been invoked. Fix has been applied, review has been requested (see above).

comment:6 Changed on 07/11/2014 at 05:15:30 PM by matze

  • Status changed from new to reviewing

comment:7 Changed on 07/12/2014 at 08:08:33 PM by matze

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:8 Changed on 07/14/2014 at 03:13:27 AM by matze

  • Blocking 760 removed

comment:9 Changed on 07/14/2014 at 06:16:33 AM by trev

  • Blocking 760 added

comment:10 Changed on 07/15/2014 at 05:06:21 PM by trev

  • Component changed from Unknown to Infrastructure

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from matze.
 
Note: See TracTickets for help on using tickets.