Opened on 12/07/2015 at 10:14:25 AM
Closed on 01/21/2016 at 07:31:45 PM
#3385 closed defect (fixed)
Revive filter2[34].adblockplus.org
Reported by: | matze | Assignee: | fred, matze |
---|---|---|---|
Priority: | P1 | Milestone: | |
Module: | Infrastructure | Keywords: | |
Cc: | fhd, Kirill, sporz, trev | Blocked By: | |
Blocking: | Platform: | Unknown / Cross platform | |
Ready: | no | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description
***** Nagios ***** Notification Type: PROBLEM Host: filter23.adblockplus.org State: DOWN Address: filter23.adblockplus.org Info: PING CRITICAL - Packet loss = 100% Date/Time: Mon Dec 7 08:18:09 UTC 2015
(Last issue tracked in #3157, though without any obvious cause either.)
Attachments (0)
Change History (7)
comment:1 Changed on 12/07/2015 at 10:25:12 AM by matze
- Cc Kirill sporz added
comment:2 Changed on 12/08/2015 at 10:13:37 AM by trev
- Cc trev added
comment:3 Changed on 12/09/2015 at 08:47:11 AM by sporz
connections to filter24 currently time out - should that be a ticket of it's own?
comment:4 Changed on 12/09/2015 at 08:53:13 AM by matze
- Summary changed from Revive filter23.adblockplus.org to Revive filter2[34].adblockplus.org
<palant> matze: it's two servers [...], we would try it on one first of course <matze> palant: very well, let's go with 24 and use 23 for comparison
@sporz Sorry, forgot to update the ticket yesterday.
comment:5 Changed on 12/14/2015 at 09:05:30 AM by matze
Both hosts are back in balancing, filter24 with the latest (driver) packages, and monitored for significant differences in behavior.
comment:6 Changed on 01/21/2016 at 07:05:06 PM by matze
No. 23 just crashed:
Notification Type: PROBLEM Host: filter23.adblockplus.org State: DOWN Address: filter23.adblockplus.org Info: PING CRITICAL - Packet loss = 100% Date/Time: Thu Jan 21 18:50:39 UTC 2016
After 5 weeks this is the first incident with any of the servers in question. Analysis pending.
comment:7 Changed on 01/21/2016 at 07:31:45 PM by matze
- Resolution set to fixed
- Status changed from new to closed
As expected, the server (resp. it's uplink) went down with the exact same symptoms as before.
I consider this partial confirmation for our former theory, thus updated the drivers and put the host back in balancing. We'll update the remaining ones next week, @fred and I have scheduled another complete deployment of all servers anyway.
When filter23 and filter24 go down their kern.log is full of eth0: link up messages. Judging by my googling, this indicates an issue with the network adapter driver.