Is this indication of a failing hard drive?

Discussion in 'Mac Pro' started by big_malk, Oct 2, 2009.

  1. big_malk macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #1
    I'm getting a lot of short glitches on my Mac Pro, during which iTunes might pause for a few seconds, text stops appearing as I type, beach balls etc.
    I've looked in the console as these happens, and things like this appear a lot just after the hang is over.
    My disk "RAID Set" is two disk stripped RAID array in Bays 3 and 4.
    Disk utility says both disk's SMART status is verified, but I've heard that's doesn't always mean their aren't any problems. I started getting this problem while playing videos, but it seems to be happening a lot more often, during times of less demand on the disk, which is worrying :/
    Updating my backup as I type this!
     
  2. rowsdower macrumors 6502

    Joined:
    Jun 2, 2009
    #2
    It's definitely possible that it's a failing drive. If SMART reports a problem then there probably is one, but if SMART reports good it doesn't mean that there isn't a problem. Many drive manufacturers have testing utilities. For example, Hitachi's is here. You could try running that if you can find it for the manufacturer of your drive. Keep your backups up to date.
     
  3. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #3
    I think the failing drive is Western Digital, as far as I can tell they just recommend using Disk Utility.

    I'm running CarbonCopyCloner now, it's taking forever and every app is fluctuating between the beachball and regular cursor when I put the mouse over their windows, so I think I'll just leave it and hope the backup complete's successfully!
    I tried logging in over my network to copy some work files onto my MBP and it keeps failing to even log in, but it worked fine just a couple of hours ago.
    I guess I should just be thankful this didn't happen on a Monday morning :(
     
  4. rowsdower macrumors 6502

    Joined:
    Jun 2, 2009
    #4
    It sounds like you'll know soon in any case if the drive is failing. At least you have backups. It seems like most of these threads end with important work being on the drive and not having any way to recover it.

    As far as I know Disk Utility only checks for filesystem errors and permissions problems. Filesystem errors might be caused by a failing drive, but I don't think that Disk Utility will tell you plainly that the disk is failing. Western Digital does provide this ISO image for their drives; however, it is a DOS formatted bootable CD and I don't know if that will boot on a Mac.
     
  5. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #5
    I'm trying to check the warranty and I'm a bit confused.
    The console error said "02/10/2009 15:10:18 kernel disk4: I/O error."
    According to system profiler there is no 'disk4', the BSD names are disk0 - disk3 for my internal disks, 'disk5' is a USB pen drive, and the external volumes I'm backing up to are 'disk7s2' and 'disk7s3'.

    Does it mean the drive in Bay 4? Or am I being a bit stupid here? :confused:
     
  6. rowsdower macrumors 6502

    Joined:
    Jun 2, 2009
    #6
    What shows up in Disk Utility? disk4 might be a virtual drive related to the RAID (i.e. the full volume of disk0+disk1+disk2+disk3). I have never looked at Disk Utility or System Profiler on a Mac Pro with a RAID so I'm not sure.
     
  7. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #7
    Good point, your right. In the info window of the RAID set it says 'disk4'.
    I wonder if it's worth ordering a replacement right away or reformatting in some hope of it not being a hardware fail, maybe a corrupt partition or something? :/
     
  8. rowsdower macrumors 6502

    Joined:
    Jun 2, 2009
    #8
    If that's the case, I think Disk Utility would find it, as suggested by Western Digital. It's worth a shot, but it's probably a hardware failure somewhere.
     
  9. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #9
    If you don't already have one, get a spare drive to replace it with ASAP. When using RAID, you always want to keep at least one spare drive handy for such problems, as they do fail. It's just a matter of when, not if.
     
  10. rtrt, Oct 2, 2009
    Last edited: Aug 13, 2011
  11. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #11
    The trick is, not everyone can really interpret SMART data, assuming it doesn't just produce a FAIL indication.

    IIRC, some of the Windows based SMART utils do better (i.e. presents it in a manner easier to understand), but that depends on whether or not the OP has a copy installed on the system, or is willing to if it's not.
     
  12. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #12
    I've got TechTool Pro 5, so I've booted that up and run some of the tests.

    Volume Structures were find on the RAID volume.
    SMART check has at least one indicator verging on fail on every drive I have! :eek:
    Of the two disks in the RAID, one had two indicators almost on half way to fail, they were temperature related (can't remember exactly). The other disk was ECC related (error correction), which was about half way to fail, I don't know much about the intricacies of SMART, but that one sounds like it could be causing my problems!

    I'v performed a surface scan on one of the RAID disks and it was fine, I'm running the check on the other and so far it says 17 bad blocks encountered.

    Overall, I am not impressed with the state of my hard drives :(
     
  13. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #13
    What drives are you using?

    I'm not a big fan of consumer grade models for RAID, even for software implementations.
     
  14. rtrt, Oct 2, 2009
    Last edited: Aug 13, 2011
  15. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #15
    The surface scan of my other RAID disks has so far found 264 bad blocks, having scanned 66,519,051 blocks, which seems pretty high to me.
    Tech Tools says 'The Surface Scan test checks a hard drive for physical bad blocks'. I've been reading arond, and advice for bad blocks is often to format the HD and zero-out data. But this wouldn't help if these bad blocks are physical would it?
    The more this test scans the longer it's predicted to take, 6 hours remaining and still rising, I think I'll call it quits.
     
  16. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #16
    I haven't had an MP in over a year now, and had forgotten it even did that. I'm more familiar with data being numerical data, and most software doesn't take the drives age into account (to obtain any idea of bad blocks/unit time).

    So I'm used to having to interpret the data myself, particularly when it's in a RAID (though I usually use a hardware controller, not the system's SATA ports).

    That is bad to me.

    You could load up the drive maker's disk utility software (not OS X's Disk Util), and do a low level format, as that will remap the bad blocks. Follow it up with a high level format and re-create the array, and restore the data.

    It's time consuming, but may help. That said, it looks like you'd want a replacement drive, or better yet, a set.
     
  17. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #17
    I downloaded SMART utility, and things have either gotten worse, SMART utility decides things are 'failing' with a lower threshold, or there's an error in the SMART data somewhere.

    disk3, a member of the RAID set, is reported as failing, which it wasn’t before. It has 757 'pending bad sectors', 0 removed or reallocated. 5737 'total errors', recent ones including 'Uncorrectable Error' and 'Unknown'.
    disk2, the other member of the RAID set, passed perfectly! Huzzah!
    However my downloads/scratch/other disk, is reported as failing, but as far as I can tell only because of 1 'reallocated bad block', which should be avoided by the system and it would work fine, right? Even if it is an indication of more bad blocks to come?
    Incidentally, SMART utility regularly reported 'An error occurred attempting to read SMART data.'.

    I had planned to replace the two RAID drives with 250GB 10,000rpm drive, assuming they cost of them would have come down since I last looked, but they seem even less common and more expensive than before? :(

    I guess I could just keep this a cheap repair and only replace the 1 failing drive with the same as before (except not Western Digital, every one of them I've had has failed!!).

    I'd love to use RAID 5 instead of 0, but as far as I can tell this still isn't possible in a Mac Pro without a hardware controller? Which, as far as I can tell, aren't all that cheap?

    Thanks for all your help guys! :)
     
  18. rtrt, Oct 4, 2009
    Last edited: Aug 13, 2011
  19. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #19
    disk3 is out of warranty, so I've ordered a replacement and it should arrive on Tuesday :)
    disk2 is still under warranty, so I might start copying everything off it and RMA it, but wouldn't one bad block be considered acceptable? Like a couple of dead pixels often are?

    Thanks again for all your advice everyone :)
     
  20. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #20
    Yes, one bad block would be considered acceptible.

    You'd want to get the drive diagnostics utility from the drive manufacturer, and remap the bad sectors on it. But make sure you've a backup of the data first, as it will wipe the drive, and the array will need to be re-created, then restored.

    It's a fair bit of work, but worth it IMO. Leaving Bad Sectors is always a bad idea in RAID.
     

Share This Page