Makosuke said:
I'm answering OSX Itself because the ONLY significant stability problem I have is with unresponsive network volumes hanging the system. I just REALLY wish OSX failed more gracefully when a mounted volume stopped responding, but "stalled" disk access will actually hang the kernel, preventing even force quitting.
Yes!! It boggles my mind that the 5th major release of a so-called "modern" operating system can still go completely out to lunch due to external servers going down. Absolutely pathetic.
Example: on the Macbook, I'd mounted a network share residing on my Power Mac. I put the Macbook to sleep, forgetting completely about the network drive. Later on, I rebooted the Power Mac into single-user mode to run
Memtest. In this mode, all services are off, so it's just as if the server had gone down.
So I woke up the Macbook while this was happening and got nothing but a spinning beachball. I couldn't start any new applications, couldn't get any usable response. I was able to login remotely over ssh, so I tried to reboot it that way (
sudo reboot). Nada. The reboot process hung, just like everything else. Tried to force-unmount the network drive every way I could possibly think of, but it just wouldn't listen. No amount of waiting helped either. OS X was determined to wait for that drive to come back, and refused to do anything until it did.
Finally the Power Mac was done, so I restarted it in normal multi-user mode. The network share came back up, and suddenly the Macbook was responsive again!
Absolutely ridiculous.
How hard is it to include a timeout, so that if the network server stops responding, it stops trying? Actually, I can recall seeing this sort of thing in Panther ("Server x has stopped responding, do you want to disconnect?"). Maybe it's just broken in Tiger? A machine should never lock up, and especially never because of external factors like network servers.
My only other area of instability is also related to external disks, but this time USB or FireWire. Any time I mount a damaged external disk, I can almost guarantee a kernel panic. I'm sorry, but the kernel should be robust enough such that if the disk is damaged/inconsistent, the very worst it does is bail out and unmount the thing. It should never,
ever get into a panic situation due to this. Just pop up a dialog box and say "Sorry dude, your disk is messed up and I can't read it. Better luck next time lol."
I know OS X has a very long heritage back to NeXT, BSD, Mach, etc and there's probably some very old code lurking in there that didn't assume that external networks and devices will change on the fly. Back in the 1980s and most of the 90s, that was certainly never the case, so programmers never worried about it happening. But nowadays, we plug and unplug devices all the time, move our laptops to different wireless networks, etc.
Apple needs to go through and do a wholesale audit/redesign of any parts of the system that don't handle these situations
100% transparently. I'd much rather see that in Leopard than any new whizbang-wowie features that they're probably wasting time on.