Opened on 04/17/2015 at 04:12:45 PM

Closed on 04/28/2015 at 02:01:41 PM

#2341 closed defect (fixed)

Configure RAID on server_15.adblockplus.org

Reported by: matze Assignee: matze
Priority: P3 Milestone:
Module: Infrastructure Keywords:
Cc: fred, trev, fhd Blocked By:
Blocking: #2338 Platform: Unknown
Ready: yes Confidential: no
Tester: Verified working: no
Review URL(s):

Description (last modified by matze)

The only (filter-) server that is missing a (software) RAID is server_15.adblockplus.org. Its second hard-drive is not even in use.

ToDo

Configure the software RAID.

Attachments (0)

Change History (15)

comment:1 Changed on 04/17/2015 at 04:15:40 PM by matze

  • Blocking 2338 added
  • Cc trev fhd added

comment:2 Changed on 04/17/2015 at 04:23:06 PM by matze

This ticket will be done next week, together with Fred who also wants to look into the installation process.

Nevertheless I have removed the server from balancing already. The logs have been backed up, but we will do that again just before we trigger the re-installation - there are still some requests coming in, due to the TTL of the DNS records. (As a sidenote: The where actually two identical IPv4/IPv6 records for the balancer each. I have no idea why, especially because it seems like there haven't been any HW-issues in the past. The duplicates have been deleted.)

@trev Please have a look at the ToDo list in the ticket description; maybe you can suggest some updates here.

comment:3 Changed on 04/17/2015 at 05:32:19 PM by trev

  • Description modified (diff)
  • Ready set

comment:4 Changed on 04/17/2015 at 05:57:41 PM by matze

@trev Thank you!

comment:5 Changed on 04/18/2015 at 03:33:20 AM by matze

  • Description modified (diff)
  • Sensitive unset
  • Summary changed from Reinstall server_15.adblockplus.org to Configure RAID on server_15.adblockplus.org

Because we have a server now that actually needs to be re-installed (which was not really necessary here, but scheduled for training purpose), i've created a new ticket, copied the instructions and changed title and description of this one here to require a working RAID setup only.

comment:6 Changed on 04/21/2015 at 06:54:32 AM by matze

Before setting up the RAID today, I've started an automated hardware-check to ensure the status quo being fulfilled... and to avoid any mysterious issues during the operation.

comment:7 Changed on 04/21/2015 at 10:40:17 AM by matze

The so-far idle drive is defect. This is quite usual for a drive that has been powered up but not used for months or years (and this very circumstance was actually the reason why we thought a prior hwcheck may turn out to be quite useful..). Hetzner has been instructed to replace the faulty drive.

Last edited on 04/21/2015 at 10:40:40 AM by matze

comment:8 Changed on 04/21/2015 at 11:00:40 AM by matze

The hard-drive has been replaced, the RAID setup can begin.

comment:9 Changed on 04/28/2015 at 03:24:50 AM by matze

Great; after re-syncing the RAID the "new" drive just has failed again:

$ sudo cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md2 : active raid1 sdb3[1](F) sda3[0]
      307717312 blocks super 1.2 [2/1] [U_]
      
md1 : active raid1 sdb2[1] sda2[0]
      523968 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sdb1[1] sda1[0]
      4192192 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>
$ dmesg | tail
[48152.644936] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 90 18 00 00 00 01 00
[48152.644951] end_request: I/O error, dev sdb, sector 9443328
[48152.644985] sd 1:0:0:0: [sdb] Unhandled error code
[48152.644990] sd 1:0:0:0: [sdb]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[48152.644997] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 90 18 01 00 00 01 00
[48152.645012] end_request: I/O error, dev sdb, sector 9443329
[48152.645042] sd 1:0:0:0: [sdb] Unhandled error code
[48152.645047] sd 1:0:0:0: [sdb]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[48152.645053] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 90 18 02 00 00 06 00
[48152.645068] end_request: I/O error, dev sdb, sector 9443330

comment:10 Changed on 04/28/2015 at 03:28:12 AM by matze

Hetzner has been instructed to replace the device.

comment:11 Changed on 04/28/2015 at 03:54:12 AM by matze

A new hard-drive has been installed, RAID sync in progress.

comment:12 Changed on 04/28/2015 at 04:09:38 AM by matze

Failed again:

[  881.108505] end_request: I/O error, dev sdb, sector 41402368
[  881.108581] md/raid1:md2: Disk failure on sdb3, disabling device.
[  882.434057]  disk 1, wo:1, o:0, dev:sdb3
[  883.488521] end_request: I/O error, dev sdb, sector 8393216
[  883.488606] md/raid1:md1: Disk failure on sdb2, disabling device.
[  884.056212]  disk 1, wo:1, o:0, dev:sdb2

Suggested replacing the entire server (the other device seems OK), waiting for feedback.

comment:13 Changed on 04/28/2015 at 07:28:56 AM by matze

After burning through yet another hard-drive, Hetzner agreed and replaced the server.
The re-sync and -installation has begun, it should finish today

comment:14 Changed on 04/28/2015 at 12:54:18 PM by matze

Done except for re-syncing the logs, which is currently in progress. The new installation seems to work as expected, no further hardware issues have been encountered.

comment:15 Changed on 04/28/2015 at 02:01:41 PM by matze

  • Resolution set to fixed
  • Status changed from new to closed

Back in balancing.

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from matze.
 
Note: See TracTickets for help on using tickets.