Opened 4 years ago

Closed 4 years ago

#3385 closed defect (fixed)

Revive filter2[34].adblockplus.org

Reported by: matze Assignee: fred, matze
Priority: P1 Milestone:
Module: Infrastructure Keywords:
Cc: fhd, Kirill, sporz, trev Blocked By:
Blocking: Platform: Unknown / Cross platform
Ready: no Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description

***** Nagios *****

Notification Type: PROBLEM
Host: filter23.adblockplus.org
State: DOWN
Address: filter23.adblockplus.org
Info: PING CRITICAL - Packet loss = 100%

Date/Time: Mon Dec 7 08:18:09 UTC 2015

(Last issue tracked in #3157, though without any obvious cause either.)

Change History (7)

comment:1 Changed 4 years ago by matze

  • Cc Kirill sporz added

comment:2 Changed 4 years ago by trev

  • Cc trev added

When filter23 and filter24 go down their kern.log is full of eth0: link up messages. Judging by my googling, this indicates an issue with the network adapter driver.

comment:3 Changed 4 years ago by sporz

connections to filter24 currently time out - should that be a ticket of it's own?

comment:4 Changed 4 years ago by matze

  • Summary changed from Revive filter23.adblockplus.org to Revive filter2[34].adblockplus.org
<palant> matze: it's two servers [...], we would try it on one first of course
<matze> palant: very well, let's go with 24 and use 23 for comparison

@sporz Sorry, forgot to update the ticket yesterday.

comment:5 Changed 4 years ago by matze

Both hosts are back in balancing, filter24 with the latest (driver) packages, and monitored for significant differences in behavior.

comment:6 Changed 4 years ago by matze

No. 23 just crashed:

Notification Type: PROBLEM
Host: filter23.adblockplus.org
State: DOWN
Address: filter23.adblockplus.org
Info: PING CRITICAL - Packet loss = 100%

Date/Time: Thu Jan 21 18:50:39 UTC 2016

After 5 weeks this is the first incident with any of the servers in question. Analysis pending.

comment:7 Changed 4 years ago by matze

  • Resolution set to fixed
  • Status changed from new to closed

As expected, the server (resp. it's uplink) went down with the exact same symptoms as before.

I consider this partial confirmation for our former theory, thus updated the drivers and put the host back in balancing. We'll update the remaining ones next week, @fred and I have scheduled another complete deployment of all servers anyway.

Note: See TracTickets for help on using tickets.