How can I find out if my disk is actually about to fail?

Discussion in 'Mac Pro' started by mattmower, Apr 28, 2012.

  1. mattmower macrumors member

    mattmower

    Joined:
    Aug 12, 2010
    Location:
    Berkshire, UK
    #1
    TLDR; How can I find out if I've just been unlucky and corrupted my disk somehow (e.g. we've had several storms recently, perhaps a bad write), or if there's something seriously wrong with it and it's going to fail totally.

    I've run out of disk space in my Mac Pro. I decided to buy a new 3TB disk and move TimeMachine to that, freeing up the 1.5TB disk it was using.

    I installed the new disk, formatted it, then tried to copy the Backups.backupdb folder across. About half way through it died with an error in Finder. I confess I wasn't paying too much attention at the time.

    So I started looking into how to do this without using Finder and came up with the suggestion to use Disk Utility to 'restore' the old disk to the new one. That failed with an error 254 "Could not validate source".

    At that point I tried to use disk utility to verify the old disk and it came up with some "invalid node" errors in the catalog, said it was unrecoverable and wouldn't remount it.

    This morning I restarted the MP and it mounted the disk readonly and I was able to copy off some files I wanted. I'm going to jettison the TimeMachine backup and start again (coincidentally Backblaze just finished my first full backup).

    But, before I reformat the disk and start using it I wondered what I could do that would actually test it, e.g. a full surface scan, to see if there's something more serious wrong with it and I should be thinking about getting it replaced.

    FWIW the SMART status is verified.

    Any suggestions?

    Kind regards,

    Matt
     
  2. Bear macrumors G3

    Joined:
    Jul 23, 2002
    Location:
    Sol III - Terra
    #2
    First, the SMART status isn't as smart as it should be. There appear to be issues that aren't covered by the indicator.

    Did you just use Verify in Disk Utility? Or did you actually try the Repair option?
    If you just did verify, try the Repair option.

    What brand and model is the disk and how old is it?
     
  3. mattmower, Apr 28, 2012
    Last edited: Apr 28, 2012

    mattmower thread starter macrumors member

    mattmower

    Joined:
    Aug 12, 2010
    Location:
    Berkshire, UK
    #3
    Interesting. I installed smartmontools via homebrew and it reports that SMART is supported but not turned on. For all 3 of my disks. I'm a little puzzled by that. I'm not sure what to make of this:

    Code:
    Shaddam:~ matt$ smartctl -a /dev/disk2
    smartctl 5.42 2011-10-20 r3458 [x86_64-apple-darwin11.3.0] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
    
    === START OF INFORMATION SECTION ===
    Model Family:     Western Digital Caviar Green (Adv. Format)
    Device Model:     WDC WD15EARS-00S8B1
    Serial Number:    WD-WCAVY3675819
    LU WWN Device Id: 5 0014ee 259cb0a99
    Firmware Version: 80.00A80
    User Capacity:    1,500,301,910,016 bytes [1.50 TB]
    Sector Size:      512 bytes logical/physical
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   8
    ATA Standard is:  Exact ATA specification draft version not indicated
    Local Time is:    Sat Apr 28 14:16:36 2012 BST
    SMART support is: Available - device has SMART capability.
    SMART support is: Disabled
    
    SMART Disabled. Use option -s with argument 'on' to enable it.
    Shaddam:~ matt$ man smartctl
    Shaddam:~ matt$ smartctl -H /dev/disk0
    smartctl 5.42 2011-10-20 r3458 [x86_64-apple-darwin11.3.0] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
    
    SMART Disabled. Use option -s with argument 'on' to enable it.
    Shaddam:~ matt$ 
    I wonder why SMART is turned off, and whether there is any reason not to turn it on.

    Disk utility was not able to repair the disk. It reported that there were invalid nodes in the catalog that could not be repaired. After which it wouldn't mount it either although, as I mention, OSX mounted it read-only (with a warning about disk problems) after a restart.

    I've since reformatted the disk so at this point I am really trying to ascertain whether there's a fault with the disk itself (it's 18 months old and should still be under warranty I think), or whether I just got unlucky with a corrupted file-system.

    Kind regards,

    Matt
     
  4. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #4
    Just assume that it is going to fail and replace it. Replacing a drive before it fails is quite cheap. If you wait until it fails it gets a lot more expensive. If you replace it now, you can copy all your data overnight when it is most convenient. If it fails, it might happen just after you started work on some really urgent project.
     
  5. mattmower thread starter macrumors member

    mattmower

    Joined:
    Aug 12, 2010
    Location:
    Berkshire, UK
    #5
    Thanks, I do appreciate my options. It is worth it to me, if I can find a tool that will do it, to scan the disk before going through removing, RMA'ing, and replacing it.

    That's the question: is there such a tool available?

    Kind regards,

    Matt
     
  6. derbothaus macrumors 601

    derbothaus

    Joined:
    Jul 17, 2010
    #6
    Not really. You can't SW scan for HW issues if they are not present or reporting. I usually use Diskwarrior and just rebuild the directory if DW starts having slowdowns and malfunctions I replace the drive. Other than that you replace the drive when it stops working:(
    Gnasher is right. Assume failure and use it as an (extra) backup or media files until it breaks for RMA or forget RMA and place in closet. Not worth it.
     
  7. Bear macrumors G3

    Joined:
    Jul 23, 2002
    Location:
    Sol III - Terra
    #7
    You could always try and do a 3 pass erase on the drive and see if the drive has any errors from the writes.

    All things considered, this is rather good advice.
     
  8. wonderspark macrumors 68030

    wonderspark

    Joined:
    Feb 4, 2010
    Location:
    Oregon
    #8
    Medium-length story about RMA:

    I have eight Western Digital RE-4 disks in a RAID, and last week one of the drives failed upon startup. I was able to get it to rejoin the RAID after three or four attempts to pull and reinsert it, finally rebuilding and showing up as normal again. I shut down later, and restarted it the next day... and it failed again. I pulled that drive and put it in an external drive caddy (Voyager Q) where I could listen to it closely, and I hear a sort of steady scratching sound in the drive. After starting and stopping it a few times, I noticed the scratching sound would only last a couple seconds, then go silent as the drive comes up to speed. At that point, it boots normally. Seems like it doesn't like being cold or something, since it rejoined the RAID again, and I've left the RAID on ever since with no problems. I can reboot the system and it works fine... it just fails when it's cold after being shut down for a while.

    I submitted an online RMA request with Western Digital, and it was accepted for advance RMA since the drive has a 5-year warranty. It should arrive this Monday, when I'll swap the drives and send the scratchy one back.

    If your disk is from WD, they make it really simple. You just enter the serial number, and it tells you when the warranty runs out and everything. Took about 60 seconds to fill out the data fields, and I didn't have to talk to anybody or provide any receipts... no hassle at all. I just typed "drive fails to join RAID upon cold start / spin-up" and it was accepted for RMA. Just have to pay for return shipping, but that beats buying a new one.
     
  9. mattmower thread starter macrumors member

    mattmower

    Joined:
    Aug 12, 2010
    Location:
    Berkshire, UK
    #9
    Just for reference it seems TechToolDeluxe (if you can find your AppleCare CD, I can't) has a surface scan option. Presumably TechToolPro also has but I wasn't willing to pay $40 to find out.

    However another option is to use the badblocks command from the ext2fs package. You can find out more here.

    I ran an erase pass on the disk last night writing 0's and there was no failure. I'm currently running a non-destructive read using badblocks to see what it has to say.

    If the drive can't complete it or has significant bad blocks I'll RMA it. Otherwise I think I just got unlucky and a bad write messed up the file system.
     
  10. derbothaus macrumors 601

    derbothaus

    Joined:
    Jul 17, 2010
    #10
    Sounds like a plan. Personally I don't trust Techtool (any version) even for .99. Unless they rewrote the thing. Surface scan is nice but may also not show HW issues if present. I own a copy of Prosoft DriveGenius 3.x but really never use it. Their DataRescue product is top notch. So maybe that is a better alternative. At least the company has a good rep.
     
  11. mattmower thread starter macrumors member

    mattmower

    Joined:
    Aug 12, 2010
    Location:
    Berkshire, UK
    #11
    Hrmm... I got DriveGenius as part of a bundle deal a couple of years back. The one time I tried to use it, the disk got screwed up. Perhaps coincidence but I've never trusted it since.

    After doing a full erase with 0's of the disk I ran badblocks and it reported the disk has no bad blocks. I'm keeping an eye on it but I think the disk is okay and I got unlucky with the corrupted file system.

    Matt
     
  12. derbothaus macrumors 601

    derbothaus

    Joined:
    Jul 17, 2010
    #12
    Uh oh. I never really run anything other than Diskwarrior. As I said Prosoft is a great company but anyone can produce a turd. I am not neurotic enough to just use drive utilities whenever so testing this stuff takes me some time. I have to have some failures first and then I want to use what I know works. Thanks for the DG experience I got mine in bundle as well. FWIW I have never had techtool find anything wrong with a drive even when it was in the process of screeching death. Nice blinking lights though:)
     
  13. ActionableMango macrumors 604

    ActionableMango

    Joined:
    Sep 21, 2010
    #13
    I'd replace the drive. You are assuming a drive is either good or bad, and that you can simply do a test and move on based on the results. But drives can be intermittently bad, or bad in ways that tests cannot detect.

    I've had a drive in the past with intermittent problems. Everything would seem okay for a long time then it would go to hell. Format, restore from backup, thorough testing, everything seems fine again for a couple of weeks, then BLAMMO again. Maybe your time is cheap, but it was definitely not worth my time.
     
  14. goMac macrumors 603

    Joined:
    Apr 15, 2004
    #14
    A tool can't detect a drive failure until the drive actually fails. And if the drive fails, it's too late.

    Best to heed the above advice and just replace the drive if it's suspicious. And keep back ups.
     

Share This Page