Opened on 12/07/2015 at 10:14:25 AM

Closed on 01/21/2016 at 07:31:45 PM

#3385 closed defect (fixed)

Revive filter2[34].adblockplus.org

Reported by: matze Assignee: fred, matze
Priority: P1 Milestone:
Module: Infrastructure Keywords:
Cc: fhd, Kirill, sporz, trev Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description

***** Nagios *****

Notification Type: PROBLEM
Host: filter23.adblockplus.org
State: DOWN
Address: filter23.adblockplus.org
Info: PING CRITICAL - Packet loss = 100%

Date/Time: Mon Dec 7 08:18:09 UTC 2015

(Last issue tracked in #3157, though without any obvious cause either.)

Attachments (0)

Change History (7)

comment:1 Changed on 12/07/2015 at 10:25:12 AM by matze

  • Cc Kirill sporz added

comment:2 Changed on 12/08/2015 at 10:13:37 AM by trev

  • Cc trev added

When filter23 and filter24 go down their kern.log is full of eth0: link up messages. Judging by my googling, this indicates an issue with the network adapter driver.

comment:3 Changed on 12/09/2015 at 08:47:11 AM by sporz

connections to filter24 currently time out - should that be a ticket of it's own?

comment:4 Changed on 12/09/2015 at 08:53:13 AM by matze

  • Summary changed from Revive filter23.adblockplus.org to Revive filter2[34].adblockplus.org
<palant> matze: it's two servers [...], we would try it on one first of course
<matze> palant: very well, let's go with 24 and use 23 for comparison

@sporz Sorry, forgot to update the ticket yesterday.

comment:5 Changed on 12/14/2015 at 09:05:30 AM by matze

Both hosts are back in balancing, filter24 with the latest (driver) packages, and monitored for significant differences in behavior.

comment:6 Changed on 01/21/2016 at 07:05:06 PM by matze

No. 23 just crashed:

Notification Type: PROBLEM
Host: filter23.adblockplus.org
State: DOWN
Address: filter23.adblockplus.org
Info: PING CRITICAL - Packet loss = 100%

Date/Time: Thu Jan 21 18:50:39 UTC 2016

After 5 weeks this is the first incident with any of the servers in question. Analysis pending.

comment:7 Changed on 01/21/2016 at 07:31:45 PM by matze

  • Resolution set to fixed
  • Status changed from new to closed

As expected, the server (resp. it's uplink) went down with the exact same symptoms as before.

I consider this partial confirmation for our former theory, thus updated the drivers and put the host back in balancing. We'll update the remaining ones next week, @fred and I have scheduled another complete deployment of all servers anyway.

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from fred, matze.
 
Note: See TracTickets for help on using tickets.