For the past two weeks, I have been seriously struggling in a battle against my computer. I’ve read several hundreds of forum and blog posts that helped me along the way, and I feel obligated to give back to the community. I hope that by writing my story here someone else is better able to solve their own problems.
It all started when I decided to make a lateral OS shift from Windows Server 2008 R2 to Windows 7. I had been using Windows Server 2008 R2 as a workstation, but for several reasons (including occasional weird problems and also that the XBOX 360 can only connect to Windows 7 Media extenders) decided to switch to Windows 7. Before continuing, I think that a technical reader needs to see my computer’s specifications.
Motherboard: ASUS Maximus Formula II
CPU: Intel E8400 Core2Duo @ 3.00 ghz
RAM: 8 GB: (4×2 GB) Corsair DDR2
Graphics Card 1: NVIDIA GeForce 9800 GTX+ (using dual DVI)
Graphics Card 2: NVIDIA GeForce 6800 (using single VGA)
HDD: 250 GB Seagate (C:), 500 GB Seagate (RAID 1, data), 500 GB Maxtor (RAID 1, data), 750 GB Seagate (spanned, junk), 750 GB Seagate (spanned, junk). The two 500 GB were Windows Dynamic Disk RAID 1 and the two 750 GB were dynamic disk spanned. All SATA.
OS: formerly Windows Server 2008 R2 x64
OK, so I already had a Windows 7 DVD disk on hand (used for an installation on a different machine), so after transferring the contents of my two spanned drives into a backup (not the RAIDed ones because even if one screws up, the other should still be fine), I decided to start the process. After booting into the Windows 7 disk, and selecting my C: disk (after quick formatting it), it got about 70% through “Expanding files” before failing. I don’t remember the error code, but trying it two more times confirmed that I would need to do something new.
So, I wasn’t too worried because I had the Windows 7 ISO on hand too, so I burned another copy using another computer. Popping that in, the installation process wouldn’t even start, due to some “volsnap.sys file is corrupt or missing” error.
Now apparently having a bad ISO, I went to go download another non-official copy of the Windows 7 ISO. A few hours later, trying that, I was back to the 70% expanding files problem.
Not believing that three different disks could fail, I started to think that the cause of the problem was something hardware related. The most common advice out there for weird errors is bad RAM, so I decided to test that first. Running the Ultimate Boot CD, I ran Memtest 86 for several passes and it encountered no problems. I then ran the Parted Magic v4.10 linux distro from the UBCD and checked the SMART statuses of the hard disk drives. All of those were healthy too. At this point I hypothesized that something was wrong with the motherboard or CPU, and went to check online if there were many similar reported problems with the motherboard. Although there were some complaints, there weren’t enough to justify a $200 motherboard replacement just yet.
Seeing that the linux distro loaded fine, and since I needed to use the internet and computer right away (it had been a few days already and I had some work to do), I decided to install Ubuntu 10.10. The installation process of that was not smooth either (which I attributed to an underlying motherboard problem), but that isn’t the point of this post so I won’t get too much into that. Eventually I did manage to get it installed and running.
Ubuntu was working well for me for a few days (the improvements to it since the last time I used it are quite impressive), but knowing that I would inevitably have to switch back to Windows (due to networking with the rest of my machines and the fact that I’m primarily a .NET developer), I decided to start again the process of getting Win7.
Having 3 failed attempts at getting through the Win7 installation process, I decided to start fresh and go get another official ISO. Thankfully, that one finally burned fine and fully installed, and I thought that my horrible experience was all finally over.
After booting into the new Windows 7, one of the first things I noticed was that both of my original data disks had been overwritten with Linux data, and in fact had been reformatted by Linux (either Ubuntu or the Parted Magic, I don’t know which) into a non-NTFS file format. This obviously was a serious problem, but thankfully getting that data back was pretty straightforward with EASEUS Data Recovery.
Over the next few hours I did a several sets of Windows Updates and a full installation of the standard products that I use, and everything seemed fine. However, various subtle issues would surface in an unpredictable and unreproducible way.
At random times, my three monitors would flicker and come back. I’ve seen this happen previously when I was on Win 2008 R2 (usually when gaming), and nothing horrible came out of it, so I decided this wasn’t too serious.
Also, very occasionally, the computer would completely freeze up while I was doing something not particularly complex but involving multiple processes. For example, if I were installing a program and copying unrelated files at the same time, it would freeze to a halt. Or, if I were scanning a disk for errors (using chkdsk) and dragging a web browser window from one computer to another, CPU would go to 100% and freeze.
I also noticed that every single time I logged in, the “Microsoft Intellitype Pro 8.0″ installation box would come up. This was strange because I would expect the installation to complete on the first post-install login, not on every single login. It always appeared, showed some message like “downloading Intellitype Pro”, and then quickly disappeared.
Interestingly, since Microsoft Intellitype Pro is a Windows Update package, I would have expected it to be in either the “not installed” or the “installed” list; however, it was in neither. It seemed to be in a limbo state. In addition, only three other service packs were stuck in the “not installed” state: a required .NET framework security package and two optional NVIDIA drivers. For these three, every time I selected them to update, they would fail.
Another strange thing I noticed was that whenever I used Firefox to download something, the downloads window would appear but it would be completely blank. The download still happened normally (and the main Firefox window status bar would display the current download status), but it was very strange that the downloads window would be empty.
Another issue was that every time I set up a massive file transfer (using TeraCopy), for example from my backup to my local disks, and let it run overnight or over a few unattended hours, I would come back to a fresh login screen and be notified of a recent BSOD. This happened on every single long, unattended many-file transfer. I did look into why this was happening, and found several entries in the Event log, though they had different stop codes so there wasn’t a clear explanation of what was happening.
The last symptom was the strangest and most serious one of all. I started to notice during the installation of programs that many of my downloads were corrupt. Normally, corrupt downloads happen maybe one out of every several thousand; in other words I thank the inventors of TCP. However, I started to notice that one out of every 2-3 downloads for service packs or updates or anything were corrupt, and most impressively, certain downloaded files were *always* corrupt.
Now stop here and think how you would diagnose this issue, given the symptoms. What do you think is causing the problem? I ask about diagnosis instead of what you would do because at this point (or earlier) it might have been better to just purchase a whole new pre-built computer, but let’s say that wasn’t an option.
—-
My first thought was that the cause was an irreparable motherboard problem, however before investing $400 (to replace both mobo and cpu, given that frequently one takes the other out too), a week of shipping time, and a full re-installation of everything, I really had to prove to myself that nothing else was the cause. Especially considering that if there were a different cause, then all the investment to get a new motherboard might go to waste as the software problem might just resurface (e.g. if two installed software packages were incompatible).
Now of all the symptoms mentioned, the only two that are fairly reproducible are the Intellitype appearing and the corrupt downloads. In fact, I figured that they might even be related because the Intellitype pop-up said “downloading…” before disappearing. I checked the event logs and found that indeed some part of windows update was failing on download. When I canceled the Intellitype and asked it to never install, then the event log messages never showed up again.
So the next question then is this, why are all downloads corrupt? Earlier, when updating the computer’s software, I found that even a manual download and install of the NVIDIA driver would fail (7-zip would say that the file was corrupt). So, I re-downloaded the same file from the same location and tried it again. Again, it was corrupt. Now the interesting thing is that when taking an SHA-1 checksum of the two downloads, they were *different*! Not only that, but every single time I downloaded the file, it would have a different checksum!
I expected that maybe the network card was causing corruption, so I switched the ethernet cable from one port to the other. However, new downloads still came in with different checksums. So if every single time I download the file using this computer I get a different file, then how do I get the correct one?
I went to another computer and downloaded the same file (twice, to verify that the checksums matched on that computer, which they did – and therefore that was the correct file), and put it on a USB to bring to mine. Now here’s the kicker, when copying the file from the USB drive onto my desktop, it would again have a new checksum on every single copy! So this meant that the issue was not network related, but rather something was corrupting every single file transfer whether by network or by flash.
So, instead of copying the file to my desktop (as I normally do for large files), I ran it directly from the USB. I was afraid that Windows might internally copy the file to some temp directory (and corrupt it in the process) and run it from there, but thankfully that did not happen. Finally, the NVIDIA driver installed completely and finished without any issues.
After restart, all is well. My monitors never flicker, the computer never freezes, Intellitype installed fine, the .NET framework Windows Update package installed fine (the video drivers disappeared because I had manually installed that one), Firefox showed the downloads in the downloads window (including the ones already done), file transfers no longer blue screened, and all new downloads (including the original NVIDIA driver) were uncorrupted.
Thank goodness that everything is working now, but overall the many hours that I put into repairing the computer really tested the limits of my patience. Below are some lessons learned.
Lessons Learned:
- Don’t just throw money at a problem, understand the cause of the problem first. (Referring to the option of buying a new motherboard/cpu)
- Technology problems can be extremely time consuming… if my time doing this were billable hours, it could have been cheaper to just buy a new computer (depending on the new computer’s specs, considering depreciation etc.).
- Don’t trust software RAID, at least don’t trust Dynamic Disks
- Physically remove the hard disk drive with important data if you’re already in trouble
- Files can be recovered from a hard disk drive even if the disk is formatted to a different file system
- A bad video card driver can screw up a lot more than just the video card, such as Firefox’s downloads window…