Possible causes of severe disk corruption?

Discussion in 'Mac Basics and Help' started by Makosuke, Mar 7, 2010.

  1. Makosuke macrumors 603

    Joined:
    Aug 15, 2001
    Location:
    The Cool Part of CA, USA
    #1
    I expect I know what the answer is, but never hurts to ask.

    A good friend of mine has a 20" iMac (1st gen aluminum) that's been struck dead with a corrupt hard drive directory twice lately, and I'm trying to figure out if it's just incredibly bad luck or there's a root cause I'm not seeing.

    He was previously running 10.5, when his computer started acting up badly, until it eventually wouldn't even boot. I checked it out, and the hard drive was corrupt to the point fsck and even TechToolPro 4 couldn't fix it. I did eventually get it to mount read-only, so I got the files transferred off, though, so I wiped the drive, installed Snow Leopard, and use Migration Assistant to get things back to where they started. TTP said the hardware was fine, and after updating all his software and adding a 2GB stick of RAM everything seemed to be running fine.

    Then, a few weeks later, something went horribly wrong (I think he said during an attempt to update to 10.6.2). Again wouldn't boot, and again severe disk corruption--"Keys out of order". This time after a lot of coaxing I got TTP5 to rebuild the directory, at which point Disk Utility did about a dozen passes fixing one thing after another before finally declaring the drive ok, minus a few damaged files and with a few things in a new lost+found folder.

    I mounted it via target disk mode, cloned the drive to a disk image with CCC (clone went fine), zeroed the drive, reinstalled 10.6, updated to 10.6.2 etc, Migrated everything back over, and again it seems to be running fine.

    But given how rare corruption that severe is on a Journaled drive, I'm wondering if there's not a root cause that I'm missing.

    A full bad block scan with TTP came up clean, as did all other hardware tests, and letting Rember (memtest) run all night (11 passes) stressing the RAM also came up with a clean bill of health, so it at least doesn't seem to be hardware. The only peripheral is an ancient HP printer.

    The only system-level stuff he has installed is a Wacom tablet driver (and maybe an older Canon scanner driver, but I think that runs in the user space, and I doubt it's a problem anyway since I was using the same scanner at one point without issue). Software appears to be mainly WoW, Sketchup, Blender, and Limewire. None of which seem to be suspect in terms of disk corruption, to my knowledge.

    Anybody have any guesses or anything else I should try/check before I hand it back to him?
     
  2. Joerigoesmac macrumors member

    Joerigoesmac

    Joined:
    Feb 28, 2009
    Location:
    Genk, Belgium
    #2
    hello, ive been reading google for a few hours about your problem now, and its mostly likely a corrupted build/partition. I'm just gonna copy paste the important part here, since my english isn't that great, although it improved a lot since i got a mac (gotta love spelcheck) :d

    so the thing they recommend here is to delete every partition (you could also try a low lvl format) & rebuild a partition.

    sorry if you already mentioned it, its the only useful thing i could find :)
     
  3. Makosuke thread starter macrumors 603

    Joined:
    Aug 15, 2001
    Location:
    The Cool Part of CA, USA
    #3
    I much appreciate the effort. That was the first thing I did when I got the data off it; repartitioned with a single partition, to wipe out the old (and possibly bad) partition map, and then reformatted the new partition with "zero all data" checked, to be extra-certain.

    Short of booting from a utility supplied by the manufacturer to do a factory wipe there's not much more I can think of to do. (Though if there's anything more someone thinks I can do, I'd be willing to back up and re-wipe again, just in case.)
     
  4. toolbox macrumors 68020

    toolbox

    Joined:
    Oct 6, 2007
    Location:
    Australia (WA)
    #4
    Could be a number of things - The HDD could have a defect right out of the factory, ram could be having problems and especially if the computer is not shut off properly.

    Regular Permission repair / disk verifies can help prevent this too
     
  5. Makosuke thread starter macrumors 603

    Joined:
    Aug 15, 2001
    Location:
    The Cool Part of CA, USA
    #5
    That's the one I'm afraid of, though every failing drive I've personally had experience with in the past showed at least a couple bad blocks on a scan. No way to find out other than using it, I guess.

    I'd like to think that 8 hours of memtest stress would've found any RAM problems, but I suppose you never know.

    I am curious why you think repairing permissions regularly would help prevent volume corruption; not that it will hurt, but as far as my limited understanding of how the OS accesses the drive goes it seems like they wouldn't relate. Is this something you have experience with?
     
  6. patrick0brien macrumors 68040

    patrick0brien

    Joined:
    Oct 24, 2002
    Location:
    The West Loop
    #6
    Dumb question: Is Volume Journaling on?

    IMHE if journaling is off, fragmentation of an oft-used drive for say, movies, can lose their effing mind, fragmentation-wise.

    And I have found sometimes after a big software update, a volume's journaling off, when I had set it to on.
     
  7. Makosuke thread starter macrumors 603

    Joined:
    Aug 15, 2001
    Location:
    The Cool Part of CA, USA
    #7
    Not a dumb question at all, since an un-journaled volume would be ripe for corruption after a dirty shutdown. However, I'm nigh-positive it was on before, and I triple-checked this time to make sure I didn't miss it.

    Interesting about having journaling turn itself off; I've never seen that myself, and never heard of it happening, either. You sure it was a 10.x.x update that did it, and not something else you did at the same time? If not, that's another thing to make a mental note to check after such things. Maybe the major system updates disable it temporarily for some reason, and can fail to re-enable it if something goes wrong? That would, admittedly, explain how an error that is supposed to never happen on a Journaled volume (keys out of order) could happen after a failed update.

    As an aside, Journaling is one of the reasons this struck me as so odd. Taking care of about a dozen or so Macs at work, plus family, friend, and freelance troubleshooting, I used to see minor volume corruption on a regular basis. Running Norton Utilities (in the <OS9 days) regularly was a vital component to a heavily-used Mac continuing to work smoothly.

    Since HFS+ added Journaling, though, I almost NEVER see volume errors, and this particular computer is the only time I've seen anything that severe other than hardware failure.

    I thought that journaling was just for protection in an interrupted action, by the way; isn't the lack of fragmentation more a result of HFS+ and the built-in live defragmentation routines in OSX, not journaling per se? Or am I misunderstanding what all journaling does?
     
  8. toolbox macrumors 68020

    toolbox

    Joined:
    Oct 6, 2007
    Location:
    Australia (WA)
    #8
    When your testing the ram, are you doing it one stick at a time of do you have both in there?

    I would be doing one stick at a time. How are the diagnostics supposed to differentiate which stick is defective?
     
  9. Makosuke thread starter macrumors 603

    Joined:
    Aug 15, 2001
    Location:
    The Cool Part of CA, USA
    #9
    Rember can't, but since there were no errors at all when testing both together and exercising all RAM, I didn't see the point in trying either individually--had it generated any errors I'd have pulled one stick then the other to figure out which was the problem.

    I assumed that there aren't any situations in which a RAM module would work properly when paired with another but not alone. Conversely, I once had a pair of DMS sticks that would start having problems once they got hot ONLY when used as a pair. Either alone worked fine, which was pretty bizarre, and took me a while to pin down.

    Interestingly Apple's diagnostics, running at a lower level, can tell which module is problematic. The XServe at work was quite specific about which DIMM was throwing ECC errors when we had a stick go bad.
     
  10. Joerigoesmac macrumors member

    Joerigoesmac

    Joined:
    Feb 28, 2009
    Location:
    Genk, Belgium
    #10
    Hey, this is a long shot, but try wiping the disk clean, and make a new partition on it with ANOTHER mac. ive read that in some mac versions people had issues with partition builds. but if it was mine, i'd just replace it after all the trouble you went trough :)
     
  11. Makosuke thread starter macrumors 603

    Joined:
    Aug 15, 2001
    Location:
    The Cool Part of CA, USA
    #11
    That is a long shot, but it's a theory. If something goes bad again, I might try this (assuming he wouldn't rather just fork over for a new drive at that point).

    As an aside, I had TTP5 run three full-disk bad block scans in a row, plus three 90-minute RAM checks, and it still came up clean on all of them. Doesn't completely rule out hardware, but it certainly isn't an easy-to-find glitch if there is something wrong.
     

Share This Page