Horrible errors on my Linux box.

Horrible errors on my Linux box. When trying to boot up. I mean, eventually it did boot, but this was not good:

Feb 27 15:17:00 lauequad kernel: [21057.921922] ata2.00: cmd 25/00:08:00:08:c3/00:00:16:00:00/e0 tag 0 dma 4096 in
Feb 27 15:17:00 lauequad kernel: [21057.921923]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Feb 27 15:17:00 lauequad kernel: [21057.921924] ata2.00: status: { DRDY }
Feb 27 15:17:00 lauequad kernel: [21057.921932] ata2.00: hard resetting link
Feb 27 15:17:01 lauequad kernel: [21058.643829] ata2.01: hard resetting link
Feb 27 15:17:01 lauequad /USR/SBIN/CRON[6290]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Feb 27 15:17:01 lauequad kernel: [21059.118482] ata2.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 27 15:17:01 lauequad kernel: [21059.118493] ata2.01: SATA link down (SStatus 0 SControl 300)
Feb 27 15:17:01 lauequad kernel: [21059.226065] ata2.00: configured for UDMA/33
Feb 27 15:17:01 lauequad kernel: [21059.250462] ata2.00: device reported invalid CHS sector 0
Feb 27 15:17:01 lauequad kernel: [21059.250466] ata2: EH complete
Feb 27 15:17:32 lauequad kernel: [21089.830651] ata2: lost interrupt (Status 0x50)
Feb 27 15:17:32 lauequad kernel: [21089.830669] ata2.00: exception Emask 0x52 SAct 0x0 SErr 0x58d0c02 action 0xe frozen
Feb 27 15:17:32 lauequad kernel: [21089.830672] ata2.00: SError: { RecovComm Proto HostInt PHYRdyChg CommWake 10B8B LinkSeq TrStaTrns DevExch }
Feb 27 15:17:32 lauequad kernel: [21089.830674] ata2.00: failed command: READ DMA EXT
Feb 27 15:17:32 lauequad kernel: [21089.830677] ata2.00: cmd 25/00:08:00:08:c3/00:00:16:00:00/e0 tag 0 dma 4096 in
Feb 27 15:17:32 lauequad kernel: [21089.830678]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
Feb 27 15:17:32 lauequad kernel: [21089.830679] ata2.00: status: { DRDY }

Except they were all nicely coloured and arranged by vim’s syntax highlighting.

/var/log/syslog

This was in the file /var/log/syslog

Freakin’ scary. I thought one of my hard drives was on the way out. Fortunately it’s not my main drive, the one that houses / and /home, but it’s the second drive which mounts at /home/username/Music.

So, I thought maybe the drive was on the way out. My back ups were up to date, but I noticed that when I want to have a look in ~/Music, there were some files that were corrupt. On boot the messages included one telling me to run fsck on /dev/sdb1 (the Music partition) and then dropping me into a shell, and then fsck told me it could not fix the drive…

Hmm…

Double-checked my backups were current, then unmounted the partition and used gparted to reformat it freshly as ext4. Started to copy the files across from the backup.

Stopped. Could not access the drive.

Hmm…

Remembered an old POST OF MY OWN.

SATA cable plugs sure do wiggle in their sockets. A lot more than old IDE ribbon cables.

Powered down. Removed power cable. Opened case. Noted which SATA cables went from which socket on the motherboard to which drive. Removed them all, blew some dry air into the plugs and cable ends. Replaced the cables and gave them a good wiggle, then left them, making sure they were not getting tugged out or sideways by tension but were square in the sockets. This involved rerouting some cables so they were more comfortable, and tying a bunch of unused power plugs up out of the way.

Reboot. No error messages. Mount back up drive. Copy 240+ GB of backups onto blank drive. All faultless. Seems to work perfectly.

Take home message: SATA cables are fussy and can cause problems that might look like something worse.

Something worse.

Advertisements

Tags: , , , , , , , , , , ,

About Darren

I'm a scientist by training, based in Australia.

5 responses to “Horrible errors on my Linux box.”

  1. dotkgc says :

    I have experience with this horror error. It occurred after using LuckyBackup and/or GrSync. I could repair the drive (wich was target drive of the 2 mentioned above) by applying Western Digital DOSDLG’s extensive test with only that drive being attached. Sometimes I had to apply Seatools first. Then Lazesoft quick Partition recovery, then rebuild MBR, then CHKDSK. The repaired drive is one of 2 hitachi of the same model. Before, a WD 1TB-drive had the same errors and has now died.

    • Darren says :

      Thanks! Sounds like it’s a lot of work to recover them, and that it does relate to disk being on the way out…

      • dotkgc says :

        I don’t think it means a dead drive. And the work is not extreme once you know the cure. In a few weeks I might reattach the drive and see if it’s cured.

  2. dotkgc says :

    Good news! 80 cm-SATA-cables had haunted me for months. I reattached the “chs sector 0” drive and didn’t experience any severe problems so far.

  3. Darren says :

    Hardware does all kinds of weird stuff. There’s only so much time to spend on these issues… Recently Ethernet stopped working. Computer did not see router, lights on Ethernet port not flashing. Looked dead. Fix? Remove Ethernet cable, remove computer power cable, leave over night. Next day, and since, works fine. I dunno why. Cheerio!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: