Opened on 04/17/2015 at 04:12:45 PM
Closed on 04/28/2015 at 02:01:41 PM
#2341 closed defect (fixed)
Configure RAID on server_15.adblockplus.org
Reported by: | matze | Assignee: | matze |
---|---|---|---|
Priority: | P3 | Milestone: | |
Module: | Infrastructure | Keywords: | |
Cc: | fred, trev, fhd | Blocked By: | |
Blocking: | #2338 | Platform: | Unknown |
Ready: | yes | Confidential: | no |
Tester: | Verified working: | no | |
Review URL(s): |
Description (last modified by matze)
The only (filter-) server that is missing a (software) RAID is server_15.adblockplus.org. Its second hard-drive is not even in use.
ToDo
Configure the software RAID.
Attachments (0)
Change History (15)
comment:1 Changed on 04/17/2015 at 04:15:40 PM by matze
- Blocking 2338 added
- Cc trev fhd added
comment:2 Changed on 04/17/2015 at 04:23:06 PM by matze
comment:4 Changed on 04/17/2015 at 05:57:41 PM by matze
@trev Thank you!
comment:5 Changed on 04/18/2015 at 03:33:20 AM by matze
- Description modified (diff)
- Sensitive unset
- Summary changed from Reinstall server_15.adblockplus.org to Configure RAID on server_15.adblockplus.org
Because we have a server now that actually needs to be re-installed (which was not really necessary here, but scheduled for training purpose), i've created a new ticket, copied the instructions and changed title and description of this one here to require a working RAID setup only.
comment:6 Changed on 04/21/2015 at 06:54:32 AM by matze
Before setting up the RAID today, I've started an automated hardware-check to ensure the status quo being fulfilled... and to avoid any mysterious issues during the operation.
comment:7 Changed on 04/21/2015 at 10:40:17 AM by matze
The so-far idle drive is defect. This is quite usual for a drive that has been powered up but not used for months or years (and this very circumstance was actually the reason why we thought a prior hwcheck may turn out to be quite useful..). Hetzner has been instructed to replace the faulty drive.
comment:8 Changed on 04/21/2015 at 11:00:40 AM by matze
The hard-drive has been replaced, the RAID setup can begin.
comment:9 Changed on 04/28/2015 at 03:24:50 AM by matze
Great; after re-syncing the RAID the "new" drive just has failed again:
$ sudo cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sdb3[1](F) sda3[0] 307717312 blocks super 1.2 [2/1] [U_] md1 : active raid1 sdb2[1] sda2[0] 523968 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 4192192 blocks super 1.2 [2/2] [UU] unused devices: <none>
$ dmesg | tail [48152.644936] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 90 18 00 00 00 01 00 [48152.644951] end_request: I/O error, dev sdb, sector 9443328 [48152.644985] sd 1:0:0:0: [sdb] Unhandled error code [48152.644990] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [48152.644997] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 90 18 01 00 00 01 00 [48152.645012] end_request: I/O error, dev sdb, sector 9443329 [48152.645042] sd 1:0:0:0: [sdb] Unhandled error code [48152.645047] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [48152.645053] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 90 18 02 00 00 06 00 [48152.645068] end_request: I/O error, dev sdb, sector 9443330
comment:10 Changed on 04/28/2015 at 03:28:12 AM by matze
Hetzner has been instructed to replace the device.
comment:11 Changed on 04/28/2015 at 03:54:12 AM by matze
A new hard-drive has been installed, RAID sync in progress.
comment:12 Changed on 04/28/2015 at 04:09:38 AM by matze
Failed again:
[ 881.108505] end_request: I/O error, dev sdb, sector 41402368 [ 881.108581] md/raid1:md2: Disk failure on sdb3, disabling device. [ 882.434057] disk 1, wo:1, o:0, dev:sdb3
[ 883.488521] end_request: I/O error, dev sdb, sector 8393216 [ 883.488606] md/raid1:md1: Disk failure on sdb2, disabling device. [ 884.056212] disk 1, wo:1, o:0, dev:sdb2
Suggested replacing the entire server (the other device seems OK), waiting for feedback.
comment:13 Changed on 04/28/2015 at 07:28:56 AM by matze
After burning through yet another hard-drive, Hetzner agreed and replaced the server.
The re-sync and -installation has begun, it should finish today
comment:14 Changed on 04/28/2015 at 12:54:18 PM by matze
Done except for re-syncing the logs, which is currently in progress. The new installation seems to work as expected, no further hardware issues have been encountered.
comment:15 Changed on 04/28/2015 at 02:01:41 PM by matze
- Resolution set to fixed
- Status changed from new to closed
Back in balancing.
This ticket will be done next week, together with Fred who also wants to look into the installation process.
Nevertheless I have removed the server from balancing already. The logs have been backed up, but we will do that again just before we trigger the re-installation - there are still some requests coming in, due to the TTL of the DNS records. (As a sidenote: The where actually two identical IPv4/IPv6 records for the balancer each. I have no idea why, especially because it seems like there haven't been any HW-issues in the past. The duplicates have been deleted.)
@trev Please have a look at the ToDo list in the ticket description; maybe you can suggest some updates here.