mdadm: Add a hot-spare after failing a disk

Once the disk is failed (or is actually defective), mdadm will automatically remove it from the RAID. After that, you’ll either have to add the disk back as a data disk or as a hot-spare (which was in my case). Now, here’s after the rebuild for the failed disk started:

1
2
3
4
5
6
7
8
root:(charon.ka.heimdaheim.de) PWD:~
Sun Jul 27, 23:40:35 [0] > cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid5 sdi1[0] sdh1[6] sdj1[7] sdk1[8] sdf1[9] sdb1[5] sde1[4] sdd1[2] sdc1[1]
      15626121216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [9/8] [UUUUU_UUU]
      [===================>.]  recovery = 97.9% (1913176308/1953265152) finish=12.3min speed=53936K/sec

unused devices: <none>

In order to add the replaced disk back to the RAID, you’ll have to prepare a partition for it (see this post for more details). After that, it’s a simple call with mdadm to re-add the hot-spare:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
root:(charon.ka.heimdaheim.de) PWD:~
Sun Jul 27, 23:44:19 [0] > mdadm --add /dev/md127 /dev/sdg1
mdadm: added /dev/sdg1
root:(charon.ka.heimdaheim.de) PWD:~
Sun Jul 27, 23:44:37 [0] > cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid5 sdg1[10](S) sdi1[0] sdh1[6] sdj1[7] sdk1[8] sdf1[9] sdb1[5] sde1[4] sdd1[2] sdc1[1]
      15626121216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [9/8] [UUUUU_UUU]
      [===================>.]  recovery = 98.6% (1926141684/1953265152) finish=8.6min speed=52224K/sec

unused devices: <none>

As you can see, disk 10 (sdg1 in this example) has been added with the tag hot-spare … mdadm –detail shows that a bit better:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
root:(charon.ka.heimdaheim.de) PWD:~
Sun Jul 27, 23:44:46 [0] > mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Sat Jan 26 18:35:19 2013
     Raid Level : raid5
     Array Size : 15626121216 (14902.23 GiB 16001.15 GB)
  Used Dev Size : 1953265152 (1862.78 GiB 2000.14 GB)
   Raid Devices : 9
  Total Devices : 10
    Persistence : Superblock is persistent

    Update Time : Sun Jul 27 23:44:37 2014
          State : clean, degraded, recovering
 Active Devices : 8
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 2

         Layout : left-symmetric
     Chunk Size : 512K

 Rebuild Status : 98% complete

           Name : charon:aggr1  (local to host charon)
           UUID : 6d11820f:04847070:2725c434:9ee39718
         Events : 11221

    Number   Major   Minor   RaidDevice State
       0       8      129        0      active sync   /dev/sdi1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       4       8       65        3      active sync   /dev/sde1
       5       8       17        4      active sync   /dev/sdb1
       6       8      113        5      spare rebuilding   /dev/sdh1
       9       8       81        6      active sync   /dev/sdf1
       8       8      161        7      active sync   /dev/sdk1
       7       8      145        8      active sync   /dev/sdj1

      10       8       97        -      spare   /dev/sdg1