You seem to imply Apple's 64-bit transition method in Tiger and Leopard where both 32-bit and 64-bit apps are supported while the kernel is still 32-bit is not ideal. Apple's method actually makes a lot of sense since it fosters the development of 64-bit apps, while not breaking drivers and system compatibility as a pure 64-bit OS like Windows x64. When 64-bit apps are run the processor runs in pure 64-bit mode so you get all the benefits of more registers, larger address space, etc. When 32-bit apps are run the processor runs in 32-bit mode so nothing breaks, and the kernel is 32-bit also, but 36-bit PAE enabled so that it can manage up to 64GB of RAM for use by both 32-bit and 64-bit applications.
The problem with that approach is that it involves a lot of performance penalty. It basically is a non-scalable approach. I understand why Apple did it - for the ease of porting to 64-bit and for the driver compatibility - they simply did not had enough time and resources to attempt a full 64-bit port which is what they are doing with Snow Leopard. They should have done that with Leopard.
The transition method Apple uses avoids the chicken and the egg problem, where developers are reluctant to write 64-bit programs and new 64-bit drivers because there aren't enough users of 64-bit operating systems, while there aren't enough 64-bit users because there isn't the 64-bit programs to justify the switch and there aren't enough 64-bit drivers so many of your devices won't work.
Note that developers don't _have_ to write 64-bit programs - 32-bit programs run very well on 64-bit kernel thanks to the excellent backwarsd compatibility AMD built in x86_64. Also note that there aren't nearly as many OSX drivers out side of Apple in the OEMs to make this driver porting problem even significant - with the right kind of design and APIs 32-bit to 64-bit driver ports are no big deal - most cases are just a recompile. Those gazillion Windows and Linux drivers are already happily 64-bit.
Compare this to Microsoft who faced the same issue, albeit on a very large and arguably different scale (countless OEMs with huge number of supported devices) - to their credit have resolved the driver problem with Vista x64 - I am typing this on Vista x64 with 10Gb RAM and a load of weird devices that just work - wireless N adapters, Bluetooth Stereo Headphones, cutting edge graphics you name it. So it is possible - Apple just chose the lame route.
With a 32-bit kernel and ability to run 64-bit apps, 64-bit app development can start ahead, so that when a pure 64-bit OS with 64-bit kernel arrives in Snow Leopard there are at least some 64-bit programs to encourage users to transition so that the market is there for developers to spend time to write new 64-bit drivers and more 64-bit programs. Admittedly, in practice there isn't a glut of 64-bit programs right now, I only know of Mathematica, Cinema 4D, Chess, and XCode 3, but it's a good idea in concept.
Again 64-bit kernel does not force writing 64-bit programs - 32-bit stuff works great. So I don't understand why 32-bit kernel was needed to quick start the development of 64-bit apps. Apple could have just thrown in a 64-bit kernel and rest of the situation would have remained the same. There was no understandable justification for a 32-bit kernel to run 64-bit user space.
And in 32-bit mode, the ability to devote the complete 4GB address space to an application when it is running or to the kernel when it is running instead of having a persistent kernel that takes up 2GB leaving only 2GB for applications like 32-bit Windows also makes a lot of sense. Afterall, you don't have issues on Mac like in 32-bit Windows where even games now are hitting the 2GB application address space limit and crashing. (http://www.anandtech.com/gadgets/showdoc.aspx?i=3034&p=1)
You have different issues on the Mac - scalability hurts due to 32-bit kernel and forced 4G/4G split. It's bad enough that no one chose to do it prior to Apple and it shows.
The problem again is that it is not a sane solution - it involves mapping and unmapping of kernel address space on each switch from user to kernel mode.
And devoting the full 4GB address space on a 32-bit OS to an application or a kernel is not an Apple only kludge as you imply. Red Hat Linux has a kernel called Hugemem which allows individual applications and the kernel to exclusively consume the full 4GB space on 32-bit processors, just like in OS X. (http://blogs.oracle.com/gverma/2008/03/redhat_linux_kernels_and_proce_1.html)
You are confusing the kludges here - PAE is a performance problem but it's order of magnitude less horrible than the kludges I was referring to - which were - a) Having to do redundant data copies to and from user->kernel at each switch and due to the 4G/4G split b) Needing to have stubs that again copy data around between user space and syscall handlers due to the kernel being 32-bit and user space which can be 64-bit.
That is correct - but PAE is again not the problem - the problem is having 4G split by default and running 32-bit kernel while supporting 64-bit user space and 32-bit drivers. Windows, Linux, Solaris all of them don't do that for a reason - it's not that they were not 'innovative' - it's that they understood that this solution will not scale in the markets in which they operate (Server space) and that they had to deal with a cleaner solution sooner or later and they chose rightly to face it sooner where as Apple in hindsight did the wrong thing - they now have to stop and do the right thing in Snow Leopard. Vista/Linux/Solaris have long advanced past the problem - driver availability and scalability are no problems there.PAE is actually supported in consumer versions of Windows too, but it's disabled because drivers need to be written to take it into account. Server drivers have long since standardized on PAE support since before 64-bit processors were available it was the only way to get more than 4GB of memory. There was no such demand before in consumer Windows, so consumer drivers don't support PAE and it's too late to get every driver rewritten.
When Apple transitioned to Intel, they learned from this issue and implemented PAE by default from the start. This is why there are no driver issues and Apple supports the Mac Pro running 32GB of RAM even though the Tiger and Leopard kernels are both 32-bit.
Red Hat did provide a hugemem kernel with PAE enabled in the days prior to general availability of 64-bit x86 CPUs and they also provided 4G/4G user/kernel split *as an option* but as a Linux contributor I know that it is not at all encouraged especially in the day where most all CPUs are 64-bit capable already. And they also do not run a 32-bit kernel to support 64-bit apps and nor do they by default do a 4G/4G split. No one does all those horrible things together except Apple.
There is a reason no one uses OSX on Server (including Macrumors.com