I have a Crucial P5 SSD that I've been using without any issues in an external enclosure for about 18 months.
Yesterday I mounted one of the archive volumes on it and tried to sync some data from it and found the system hanging and not doing anything. No errors reported, just hung, seemed like a system-wide IO hang, even switching apps was blocked even though practically nothing on the system has anything to do with this external drive.
Dug a big deeper and found the following:
- Of the 4 volumes on this drive, 1 of them caused hangs on any access about 80% of the time. Sometimes I could read/write, most of the time it hung
- Swapped it into a different (more expensive) enclosure - same issue
- In recovery mode I was able to access the files on it without any hanging
- Ran disk repair on the toplevel device, the container device and all individual volumes in recovery mode - every test said everything's fine
- I tried to delete the affected volume and that caused disk utility to hang in the same way
- Tried deleting the volume in recovery mode - same hanging
- Tried to completely format the drive from recovery mode - hung
Couldn't get DriveDx to work with my external enclosures so couldn't verify smart status, but with their driver installed the limited information I could pull back with smartctl suggested it was healthy.
At this point I had almost concluded that the drive was dead but then tried a low-level erase using dd to write zeros to the raw device. After a few minutes I stopped it and tried to format the drive again and it worked and now I have no issues any more. I didn't lose any data because this is a clone of another external and I've since been able to write about 75% of its capacity without any issues.
So I'm wondering what happened here, it seems unlikely to be a hardware failure. My best guess is that some APFS header got corrupted in a way that broke any attempts to read or modify it.
Anyone seen this kind of thing before?
Yesterday I mounted one of the archive volumes on it and tried to sync some data from it and found the system hanging and not doing anything. No errors reported, just hung, seemed like a system-wide IO hang, even switching apps was blocked even though practically nothing on the system has anything to do with this external drive.
Dug a big deeper and found the following:
- Of the 4 volumes on this drive, 1 of them caused hangs on any access about 80% of the time. Sometimes I could read/write, most of the time it hung
- Swapped it into a different (more expensive) enclosure - same issue
- In recovery mode I was able to access the files on it without any hanging
- Ran disk repair on the toplevel device, the container device and all individual volumes in recovery mode - every test said everything's fine
- I tried to delete the affected volume and that caused disk utility to hang in the same way
- Tried deleting the volume in recovery mode - same hanging
- Tried to completely format the drive from recovery mode - hung
Couldn't get DriveDx to work with my external enclosures so couldn't verify smart status, but with their driver installed the limited information I could pull back with smartctl suggested it was healthy.
At this point I had almost concluded that the drive was dead but then tried a low-level erase using dd to write zeros to the raw device. After a few minutes I stopped it and tried to format the drive again and it worked and now I have no issues any more. I didn't lose any data because this is a clone of another external and I've since been able to write about 75% of its capacity without any issues.
So I'm wondering what happened here, it seems unlikely to be a hardware failure. My best guess is that some APFS header got corrupted in a way that broke any attempts to read or modify it.
Anyone seen this kind of thing before?