In one sense, there's not much troubleshooting to be done for a processor. A properly installed processor simply works. If it stops working, it's dead and needs to be replaced. That seldom happens we're tempted to say "never" unless the processor incurs lightning damage, is the victim of a catastrophic motherboard failure, or overheats severely (usually from misguided attempts at overclocking, or running the processor faster than its design speed). A processor in a system with a high-quality motherboard and power supply that is protected by a UPS or a good surge protector is likely to outlast the useful life of the system.
OLD PROCESSORS NEVER DIE
In our more than 20 years' experience with many hundreds of systems, we can count on one finger the number of processors that have failed other than as a result of power spikes, overheating, or other abuse. And we suspect that one was killed by a power glitch.
In recognition of the primary danger, modern processors incorporate thermal protection, which slows down the processor or stops it completely if the temperature rises too high. Even if the processor isn't throttling throughput, operating it at a high temperature can reduce its life. Accordingly, it's important to monitor processor temperature, at least periodically, and if necessary, to take steps to improve processor cooling. If your system slows down for no apparent reason or hangs completely, particularly in a warm environment or when the processor is working hard, it's quite possible that overheating is responsible. Here are the most important steps you can take to avoid overheating:
Use the motherboard monitoring program, or reboot the system, run BIOS Setup, and view the temperature and fan speed section. Take these measurements when the system has been idle as well as when it has been running under heavy load. It's important to do this initially to establish a "baseline" temperature for the processor when it is idle and under load. You can't recognize abnormally high temperatures if you don't know what the normal temperature should be. If you run the motherboard monitoring program, set reasonable tripwire values for temperatures and configure the program to notify you when those temperatures are exceeded.
HOW COOL IS COOL ENOUGH?
As our editor pointed out, if you've installed the CPU cooler or thermal compound improperly, the baseline temperature you measure may be much too high and you may not realize that. It's impossible to say what a "normal" temperature is, because so much depends on the particular processor and CPU cooler, the case and cooling fans you use, the ambient room temperature, and so on. As a rule of thumb, with the processor at idle in a standard mini-tower case, we consider a processor temperature under 35 C to be good; the 35 C to 40 C range to be acceptable; and anything over 40 C to be good reason to improve the cooling by using a better CPU cooler and/or better case fans. If you're using a small form factor case or a hot-running processor, such as a Prescott-core Pentium 4, normal idle temperatures may be 5 C to 10 C warmer. Under heavy load, the processor temperature may increase 20 C or more. We consider anything up to 60 C normal. At 65 C we are concerned. At 70 C, we shut the system down and determine what's causing the high temperatures. Some serious gamers routinely run their processors at 80 C or even 85 C, but doing that may shorten processor lifetime dramatically.
Blocked air vents can increase processor temperature by 20 C (36 F) or more. Clean the system as often as is necessary to maintain free air flow. If your case has an inlet air filter, check that filter frequently and clean it as often as necessary.
CPU coolers vary greatly in efficiency (and noise level). Although the CPU cooler bundled with a retail-boxed processor is reasonably efficient, replacing it with a good aftermarket CPU cooler can reduce CPU temperature by 5 to 10 C (9 to 18 F). Make sure that the processor surface is clean before you install the CPU cooler, use the right amount of a good thermal compound, and make sure that the heatsink is clamped tightly against the processor.
In particular, if you've upgraded the processor or installed a high-performance video adapter, it's possible that you've added more heat load than the case was designed to handle. Adding a supplemental fan, or replacing an existing fan with one that provides higher air flow, can reduce interior case temperatures dramatically, which in turn reduces processor temperature.
TAKE YOUR SYSTEM'S TEMPERATURE
You can use an ordinary thermometer to test the adequacy of your system fans. Measure the ambient room temperature first. Then put the thermometer close to the outputs of the power supply fan and the supplemental case fan(s). If the temperature difference is 5 C (9 F) or less, adding or upgrading fans probably won't help.
In most systems, the processor is the major heat source. A TAC (Thermally Advantaged Chassis) case provides a duct (and sometimes a dedicated fan) to route waste CPU heat directly to the outside of the case, rather than exhausting it inside the case. In our testing, using a TAC-compliant case routinely lowered CPU temperatures by 5 to 10 C (41 to 50 F) relative to running that CPU in a non-TAC case.
You can buy a TAC case, or, if you're handy with tools, turn your old case into a TAC case. To do so, simply use a 2' to 3' hole saw to cut a hole in the case side panel directly over the CPU. Make a duct of the appropriate length using cardboard or plastic tubing, and secure the duct to the case with screws or adhesive. If you want to be fancy, you can install a standard case fan between the interior panel wall and the duct.
As amazing as it sounds, changing the position of the case by only a few inches, and in some pretty non-obvious ways, can make a major difference in system and processor temperature. For example, Robert's main office system sits on the floor next to his desk, directly in front of a heating vent. During the summer, when the air conditioning is running, that processor routinely operates 5 C cooler than during the winter months, when Robert closes the vent to prevent hot air from blowing on the system. That might seem reasonable, until you realize that the cool air from the vent is blowing on the back of the system, which has only exhaust fans. The ambient room temperature is actually lower during winter months and the ambient air is what's being drawn into the system so we'd have expected the system temperature also to be lower in winter.
During the writing of this section, Barbara commented to Robert that his den system was much louder than usual. Sure enough, it was. That system sits between one side of a love seat and the side of a corner table. It had somehow been moved, slid back by only four inches or so, but that was enough to reduce the air flow significantly. The CPU fan was screaming running at about 5,700 RPM and the CPU temperature was 52 C with the system idling. Robert slid the system out a few inches, and within minutes the CPU fan speed had dropped to about 1,800 RPM becoming nearly silent and the idle CPU temperature had fallen to 38 C. Inches can make a huge difference.
Despite the odds, processors do sometimes fail. If you are reasonably certain that your processor has failed, the only practical way to troubleshoot it is to install the problem processor in another system or to install a known-good processor in the problem system. The former is the safer choice. We have never heard of a failed processor harming a good motherboard, but a catastrophically failed motherboard that has killed one processor could easily kill another. For that reason, if we're convinced that a processor is bad, we always pull it and test it in another system.