Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#766 closed defect (fixed)

Nagios Gateway Timeout

Reported by: matze Assignee: matze
Priority: Unknown Milestone:
Module: Infrastructure Keywords:
Cc: Blocked By:
Blocking: #760 Platform: Unknown
Ready: no Confidential: no
Tester: Verified working: no
Review URL(s):

http://codereview.adblockplus.org/4910365653598208

Description

Even after the fixes made in the context of #765; Nagios / Monitoring setup is still not working properly - at least in the development environment one receives 502 error codes via HTTP.

How to reproduce

Create a new instance of the server4 box in a development environment:

mhennig@kali:~/AdBlockPlus/infrastructure$ ( vagrant destroy -f server4; vagrant up server4 ) 2>&1 | tee .protocol | grep -i 'f\(ast\)cgi\|spawn'
notice: /Stage[main]/Spawn-fcgi/Package[spawn-fcgi]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Spawn-fcgi/File[/etc/spawn-fcgi]/ensure: created
notice: /Stage[main]/Spawn-fcgi/File[/etc/init.d/spawn-fcgi]/ensure: created
notice: /Stage[main]/Nagios::Server/Spawn-fcgi::Php-pool[global]/Spawn-fcgi::Pool[global]/File[/etc/spawn-fcgi/500-global]/ensure: created
notice: /Stage[main]/Spawn-fcgi/Service[spawn-fcgi]/enable: enable changed 'false' to 'true'
notice: /Stage[main]/Spawn-fcgi/Service[spawn-fcgi]: Triggered 'refresh' from 1 events

Attempt to access the Nagios UI:

mhennig@kali:~/AdBlockPlus/infrastructure$ wget -O/dev/null --no-check-certificate https://nagiosadmin:nagiosadmin@10.8.0.99/
--2014-07-10 11:09:54--  https://nagiosadmin:*password*@10.8.0.99/
Connecting to 10.8.0.99:443... connected.
WARNING: The certificate of `10.8.0.99' is not trusted.
WARNING: The certificate of `10.8.0.99' hasn't got a known issuer.
The certificate's owner does not match hostname `10.8.0.99'
HTTP request sent, awaiting response... 401 Unauthorized
Reusing existing connection to 10.8.0.99:443.
HTTP request sent, awaiting response... 502 Bad Gateway
2014-07-10 11:09:54 ERROR 502: Bad Gateway.

At this point there's also no trace of any FCGI processes at all:

mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant ssh server4 -- sudo lsof /tmp/php-fastcgi.sock
mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant ssh server4 -- ps aux | grep -i 'php\|f\(ast\)cgi'
mhennig@kali:~/AdBlockPlus/infrastructure$ 

Even an additional provisioning run does not change the behavior:

mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant provision server4 2>&1 | tee .protocol2 | grep -i 'f\(ast\)cgi\|spawn'
mhennig@kali:~/AdBlockPlus/infrastructure$ wget -O/dev/null --no-check-certificate https://nagiosadmin:nagiosadmin@10.8.0.99/
--2014-07-10 11:30:59--  https://nagiosadmin:*password*@10.8.0.99/
Connecting to 10.8.0.99:443... connected.
WARNING: The certificate of `10.8.0.99' is not trusted.
WARNING: The certificate of `10.8.0.99' hasn't got a known issuer.
The certificate's owner does not match hostname `10.8.0.99'
HTTP request sent, awaiting response... 401 Unauthorized
Reusing existing connection to 10.8.0.99:443.
HTTP request sent, awaiting response... 502 Bad Gateway
2014-07-10 11:30:59 ERROR 502: Bad Gateway.

mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant ssh server4 -- ps aux | grep -i 'php\|f\(ast\)cgi'
mhennig@kali:~/AdBlockPlus/infrastructure$ vagrant ssh server4 -- sudo lsof /tmp/php-fastcgi.sock
mhennig@kali:~/AdBlockPlus/infrastructure$ 

Observed behaviour

The PHP processes are either not spawning at all or dying immediately. (Probably the latter, since the UNIX domain socket gets created.)

Expected behaviour

The processes are spawned, either at provision- (and from then on boot-) time, or by nginx on-demand.

Attachments (2)

protocol-initial-provision.txt (21.2 KB) - added by matze 5 years ago.
protocol-secondary-provision.txt (825 bytes) - added by matze 5 years ago.
2nd

Download all attachments as: .zip

Change History (12)

comment:1 Changed 5 years ago by matze

  • Blocking 760 added

Changed 5 years ago by matze

Changed 5 years ago by matze

2nd

comment:2 Changed 5 years ago by matze

  • Owner set to matze

comment:3 Changed 5 years ago by matze

Seems like yet another issue caused by a missing, explicit order and thus resulting in randomly missing dependencies. Especially because after a box restart everything works as expected.

comment:4 Changed 5 years ago by matze

  • Review URL(s) modified (diff)

comment:5 Changed 5 years ago by matze

Strike; it was caused by the php-cgi package being installed after spawn-fcgi has been invoked. Fix has been applied, review has been requested (see above).

comment:6 Changed 5 years ago by matze

  • Status changed from new to reviewing

comment:7 Changed 5 years ago by matze

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:8 Changed 5 years ago by matze

  • Blocking 760 removed

comment:9 Changed 5 years ago by trev

  • Blocking 760 added

comment:10 Changed 5 years ago by trev

  • Component changed from Unknown to Infrastructure
Note: See TracTickets for help on using tickets.