Hard Drive Troubleshooting ¶
Problems that occur when you have just installed a hard drive are almost always a simple matter of a bad or incorrectly connected cable, incorrect jumper settings, or some similar trivial problem. If a newly installed drive isn't recognized by the system, turn off the system. Check the cables make sure that they're aligned properly and seated completely and replace them if necessary. Make sure that the drive has power, and restart the system.
Once a hard drive is configured properly and recognized by the system, it generally continues working properly until it fails. If you're fortunate, you may get some warning of impending drive failure, such as odd noises coming from the drive, dialogs warning of read or write failures, or a SMART drive failure warning when you start the system. Unfortunately, hard drives often fail like light bulbs perfect one moment and dead the next.
Any News Is Bad News
If a hard drive shows even the slightest sign that it has problems, immediately copy the data from that drive to another hard drive or optical disc and replace that drive. Drive makers and third-party utility vendors offer software that claims to repair hard drive problems. Don't believe it. Once a drive has shown signs of impending failure, it will inevitably fail completely, and probably sooner than later. When a hard drive exhibits problems, your only goal should be to rescue your data from that drive. Don't even think about continuing to use a hard drive that has had problems, even if repair utilities swear that the drive is now in perfect condition. It isn't.
Isolating the problem ¶
When a functioning drive fails or begins returning read or write errors, there are many possible causes. Take the following steps to isolate the cause of the problem:
1. Before proceeding, note that a failing drive can become a failed drive at any moment. Insofar as is possible, while the drive is still functioning, copy the important files to another hard drive or an optical disc. If you succeed in copying all of the files you need, copy them again. A particular file may be corrupted on one copy but readable from another. If you get a read failure error while copying a file, choose the Retry option several times until you are sure it won't succeed. At that point, choose the Ignore option to continue copying other files. Sometimes, a file that refuses to copy despite repeated retries on one pass will copy successfully on a subsequent pass, so don't give up too early.
WHERE THERE'S LIFE THERE'S HOPE
Never turn off the system until you have taken every possible step to recover data from a failing drive. A drive that kind of works may stop working entirely and forever if you restart the system.
2. If read/write errors occur only after the system has been running for a while particularly during warm weather or if you have recently added a fast video card or other heat-producing component it's possible that the drive is overheating. Remove the case access panel and use your finger as a temperature probe. The hard drive should feel warm (perhaps quite warm) to the touch, but not so hot that it's uncomfortable to press your finger against it for several seconds. If the drive is very hot, leave the side panel off and point a standard house fan directly into the case to cool the drive. If the read/write errors disappear, it's very likely that overheating is causing the problem. Install a hard drive cooler (available from any online or brick-and-mortar computer store) and/or add supplemental cooling fans to the case.
3. One of the most common but little-known causes of hard drive read/write errors is a marginal power supply. Power supplies may begin failing spontaneously and non-obviously, so this problem is always possible. But it's even more likely if you've recently added components to your system; particularly a hot new video adapter or some other component that draws a lot of power. You can eliminate the power supply as the cause of the problem by temporarily (or permanently) replacing it with a high-quality, high-capacity unit. Although it doesn't completely eliminate the power supply as the cause a power supply can be failing rather than simply being marginal for the load you can try reducing the load on the current power supply by removing components temporarily; for example, by temporarily reverting from that hot new video adapter to the embedded video or an older, slower video adapter.
4. If the hard drive temperature seems reasonable and the power supply is not the problem, you may have a cable problem. Power down the system and replace the data cable with a new or known-good cable. Also, remove the current power cable and use a different one. (Power cables seldom fail, but we have seen it happen.)
5. Connect the drive to a different interface. Although it's uncommon for a motherboard interface to fail spontaneously, it does happen rarely. If the drive is the PATA primary master, leave it configured as master, disable the primary ATA interface in BIOS Setup, and connect the drive to the secondary interface. For an SATA drive, disable the current SATA interface in BIOS Setup, and connect the drive to another SATA interface. (Don't forget to change the boot device priority.)
Cheesy Power Supplies
In particular, the mass-market computers you find at big-box stores and from online vendors such as Dell are often equipped with power supplies that are barely adequate to start with. Something as simple as adding more memory may be the straw that breaks the camel's back. Replacing the cheap, inadequate power supply should be the first upgrade you do on such systems. Power PC and Cooling (http://www.pcpowercooling.com) and other vendors offer power supplies designed specifically to replace proprietary units that are incompatible with the standard power supplies available from big-box stores and other local sources.
6. The drive circuit board may have failed, partially or completely.
- For a PATA drive configured as master, the circuit board serves two independent functions: acting as the disk controller for all devices connected to that interface, and communicating data between that specific drive and the motherboard. The disk controller function may fail, but the data communication function continue to work. To test for this possibility, reconfigure the drive from master to slave, and connect the drive to an interface that already has a master device present, on the same or another computer. If only the disk controller function of the circuit board has failed, you will be able to access the drive as a slave device and copy the data from it to another drive or optical disc. If the problem drive still cannot be accessed, it's possible that its circuitry has failed completely or that the head-disk assembly (HDA) is physically damaged.
- For an SATA drive, any circuit board failure makes it difficult to access the drive, because every SATA drive acts as its own disk controller. Even if the data communication function of the controller is working, the drive cannot be accessed if the disk controller function has failed.
7. If you have not already done so, remove the problem drive from the current system and install it in another system. It is possible, although unlikely, that all of the motherboard interfaces have failed in the original system. If so, the drive is not the problem, and it should function normally in the second system.
If none of these nondestructive testing steps allows you to access the drive, it's likely that the drive is physically damaged, which does not bode well for data recovery.
Recovering data from a failed or failing drive ¶
A hard drive failure is annoying, but hard drives are inexpensive and easy to replace. What matters are the files on the drive. The first rule of data recovery is that a microgram of prevention is worth a megaton of cure. The best way to secure your files against loss is to back them up regularly. If you find yourself trying to recover files from a failing or failed hard drive, someone has screwed up.
Still, even if you implement an airtight backup scheme and follow it religiously, excrement happens. Those backup discs you so carefully wrote and verified may turn out to be unreadable, or you may have added or changed critical files since your last backup.
When a drive fails with files on it that haven't been backed up, decide how important those files are and how much you're willing to pay to recover them. If the answer is, "not very and not much," you can take steps to recover the files yourself. But if the files are critical and you are willing to pay someone to recover them for you, the rule is "don't just do something; stand there." Any steps you take yourself to recover the files such as installing a data recovery program may make it more difficult or impossible for a professional data recovery firm to retrieve your files.
DON'T GIVE UP TOO SOON
Although we have no direct experience with data recovery firms, our readers have recommended CBL Data Recovery (http://www.cbltech.com) and Ontrack (http://www.ontrack.com). Check prices and terms carefully before you decide to send your failed drive to a data recovery firm. Policies vary. Some data recovery firms charge a testing fee even if recovery is impossible. Others charge only if data can be recovered. A successful data recovery may cost hundreds or even thousands of dollars, depending on the amount of data and the difficulty involved.
If you decide to attempt to retrieve the data yourself, the steps to take depend on whether the drive is failing or failed:
- If the drive still functions, but returns read errors, attempt to copy the data from the drive before proceeding, as described at the beginning of this section. After you have done as much as possible to copy files from the failing drive, install SpinRite (http://www.grc.com) and let it run. It may take a day or more to do a deep analysis and recovery on the drive, but doing so may recover files that are completely unreadable using the standard copy utilities. Copy any files that SpinRite recovers to another hard drive or an optical disc.
- If you determine that the problem is a failed circuit board, and you have or are willing to buy an identical drive, you can replace the failed circuit board with the circuit board from the new drive. Obviously, if you are using a circuit board from an existing drive, be sure to back up the contents of that drive before you proceed.
- If the drive is not accessible and the steps described earlier do not make it so, disconnect the drive from your system and place it in the freezer for at least an hour. (Take steps to avoid condensation; we wrap the drive in plastic with all of the air exhausted and only the data and power connectors exposed, and connect them quickly when we remove the drive from the freezer.) Once the drive is thoroughly chilled, reconnect it to the system immediately and try to read the data from it while it is still cold. The drive warms quickly as it runs, so if this method succeeds you may need to do multiple freezer sessions to recover all of your data.
- Finally, as a last resort although this sounds bizarre give the drive a good hard knock against a padded hard surface or strike it with a rubber mallet just as the drive begins to spin up. Hard drives occasionally fail due to stiction, which means that the drive motor is no longer capable of starting the drive spinning. Sometimes a hard knock will free things up enough to allow the drive motor to spin the platters. This procedure, of course, risks doing severe damage to the drive, and should be employed only if all other measures fail.
If you are replacing a failing hard drive or simply discarding a drive that is no longer large enough, you may worry about someone recovering your data from it. If the drive is functional, the best solution is to use a drive wiping utility before you discard the drive. Our favorite wiping utility is the free Darik's Boot and Nuke (DBAN), which you can download from http://dban.sourceforge.net. DBAN offers various wiping methods, the strongest of which meet Department of Defense standards and may require a full day to run. (A basic wipe takes only a few minutes, and is good enough for anyone but the truly paranoid.)
If the drive is not functional or if you want one more layer of security after running DBAN, disassemble the drive to expose the platters. Use a screwdriver or a similar tool to make deep scratches in the platters, and then use a hammer to wreak further destruction on the platters. (We know one person who uses a vise to crunch old drives down to twisted hulks and then uses an oxyacetylene torch to melt them into piles of smoking rubble.)