How to replace a defective drive from a Ubuntu RAID 10 Array
After having setup a RAID 10 array with Ubuntu 10.04, almost 2 years has gone by. It is amazing how solid the software RAID in Ubuntu was implemented. I have had no problem with Ubuntu and the same installation from September 2010 works very well in the Acer h340 Home Server hardware. The only thing that failed was one of the drives in the Sans Digital TowerRAID external enclosure. The 1TB Seagate Barracuda 7200.11 drive failed at the age of 4 years. Luckily the drive was covered by a 5-year warranty.
After some testing of the defective drive using SeaTool in Windows, a drive defective report was generated and an RMA was accepted by Seagate. A week later, my replacement drive arrived. Unlike Western Digital, Seagate sent me a “Certified Repaired HDD”, not a new drive. I did have a “Certified Repaired HDD” failed only after 1 month of usage. But your mileage might vary…. Also to get an advanced replacement drive from Seagate, I had to pay US$9.99 to have the “Certified Repaired HDD” shipped to me before I send the defective drive back. As for the warranty service from Western Digital, they would ship me a brand new drive to replace a reported defective drive without charging me anything. This definitely will affect my purchase choices from these two manufacturers since the pricing of their new drives are very similar these days.
I replaced the dead drive with the replacement, reattached the array to Ubuntu and rebooted.
From Disk Utility, I partitioned the drive using GUID partition table, then formatted it using an ext4 file system.
Upon firing up the terminal app, I raised my privilege to super user using the “su” command. Once I have root level access, I ran the following command:
> cat /proc/mdstat
This gave me the following output:
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] md1 : inactive sdh1[1](S) sdg1[0](S) sdi1[2](S) 2930287296 blocks md2 : active raid10 sdc1[2] sdd1[3] sda1[0] sdb1[1] 1953524864 blocks 64K chunks 2 near-copies [4/4] [UUUU] unused devices: <none>
It shows that my RAID array “md1” has only 3 drives attached and it is inactive.
To be on the safe side, I stopped the RAID array:
> mdadm --manage --stop /dev/md1
Terminal output of the above command is:
mdadm: stopped /dev/md1
To start the array, I used this command:
> mdadm --assemble /dev/md1
Terminal output of the above command is:
mdadm: /dev/md1 has been started with 3 drives (out of 4).
Finally, I added the replacement drive back to the RAID array:
> mdadm /dev/md1 --manage --add /dev/sdj1
Terminal output of the above command is:
mdadm: added /dev/sdj1
By adding a drive back to the array, the array will automatically recover by replicating the data from the mirrored drive to the newly formatted drive. You can see the progress of the recovery by:
> watch -n 1 cat /proc/mdstat
The terminal window will refresh once every second to display the rebuilding process:
Every 1.0s: cat /proc/mdstat Tue Jun 5 23:29:27 2012 Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] md1 : active raid10 sdj1[4] sdg1[0] sdi1[2] sdh1[1] 1953524864 blocks 64K chunks 2 near-copies [4/3] [UUU_] [====>................] recovery = 21.5% (210768064/ 976762432) finish=197 .2min speed=64726K/sec md2 : active raid10 sdc1[2] sdd1[3] sda1[0] sdb1[1] 1953524864 blocks 64K chunks 2 near-copies [4/4] [UUUU] unused devices: <none>
My RAID array took 4 hours to recover (replicate data to the new drive).
Intense read! This is over my head. I’m thinking of just getting a drobo.. what do you think? Nice to see you back blogging Kam! Don’t work too hard!
Hey Howie, from what I heard Drobo is great and it has crossed my mind to get one as well. If you get one, make sure you don’t fill it up all the way. I heard Drobo needs some “working space” to expand and replicate files.
Sorry work has taken a lot of my personal time and hope I will be able to get back to blogging from now :)