Silently malfunctioning harddrive

There are some hardware difficulties which are easy enough to manage. Either your computer stops/reboots every once in a while – check the PSU or faulty memory. Or maybe your harddrive is making the click of death. It might even be as obvious as a bulgy, leaking capacitor on the motherboard.

But when you start noticing new ways for a computer to silently fail, you’re up for an interesting – though frustrating – night without sleep.

The other night the server for Asian DVD Club stopped responding. Sure enough I hopped on my bike and got to the server just to see that it still reacted on Num Lock switching, but there was no VGA output.

Bah! Humbug.

Having gone through a memory check, PSU change, Live USB boot, and even a motherboard switch etc… I had broken it down to the harddrives. Granted, I had my suspicions to start with because the disk activity LED was kept lit whenever the computer crashed or stopped responding.

But there were no DMA errors. S.M.A.R.T. didn’t notice anything peculiar. It was as fast as usual and wouldn’t immediatly crash on a high load. The server _seemed_ to be running ok (for my tests) when I unplugged the disks from my secondary IDE controller – but lo, I was fooled. I just didn’t try it hard enough

A couple of hours later, I dd if=/dev/sda of=/dev/sdb to a different drive (fortunately only 20GB system drive) and popped it in. Smeg me sideways, it “didn’t work”. I booted from the new disk and all, even fscked all the filesystems and stuff.

So I was confused. I took my mind off things and came back after a while. What I didn’t think could have been a problem apparently was. The old system drive was connected to the secondary IDE controller. The new system disk to the primary. Then why on Earth would Linux freeze up when accessing the old drive?

Sigh. By this time I had already reinstalled Ubuntu server 9.04 (jaunty) to make sure it wasn’t ReiserFS spooking up. Now everything is ext4 and cleanly installed – aka barely configured.

And it seems to all be working fine. The only problem I actually had was a harddrive that would lock up the IDE controller (both of them even!). Even though it seemed perfectly healthy.