sjl said:
I'm not so sure. SPARC is a case in point: UltraSPARC is a 64 bit processor, but you still see a great deal of software compiled in 32 bit mode for Solaris. This isn't so much for compatibility -- a lot of this stuff is certified for later versions of Solaris that generally run on UltraSPARC-based systems -- as it is for performance, AIUI. If you don't need the 64 bit stuff -- ie, your memory needs are modest, and your integer arithmetic fits fine in 32 bits -- the odds are good that any appropriate Solaris stuff will be 32 bit code, not 64. This is basically where I was coming from. If the CPU works fine with both 32 and 64 bit code, I don't see a general need to recompile 32 bit code to 64 bit code just for the sake of it. x86-64 is a case where it wouldn't be for the sake of it -- you'd be recompiling to take advantage of the extra registers.
OK, I see your point. There are a couple of differences with the SPARC case, though.
- The 32-bit Solaris code is compatible with older 32-bit SPARC systems - so you could get by with one "thin binary" for apps that didn't need 64-bits. The "32-bit x64 with extra registers" would not run on x86 machines - so you'd have to build two.
- The first 64-bit SPARC machines were SLOW - memory was pre-SDRAM 60ns (that's 16 MHz). Reducing memory bandwidth was important - today's PCs have far more bandwidth available.
- Sun ships both 32-bit and 64-bit Solaris in the same kits - so many people are still running 32-bit Solaris on 64-bit capable hardware. (Just like most people run 32-bit O/S on x64 and only 32-bit O/S is released for OS X.)
On the other hand, your "32-bit x64 with extra registers" would require a recompilation, and would not be compatible with either 32-bit hardware or 64-bit operating systems (unless the OS makers included a third set of libraries and APIs).
In addition, the extra memory traffic due to 64-bit pointers is really noise for most applications. You'll need a very carefully run benchmark to separate the 64-bit burden from the noise.
Of course, there are some applications where it is noticeable - for example if your data is a huge doubly-linked list or doubly-linked tree with very small data payload per structure.
For those applications, most systems have compiler/linker options to allow 32-bit pointer usage on a 64-bit system. (If you can guarantee that the data structure resides in the low 4 GiB of virtual address space, you can safely truncate the pointers when you store in memory, and extend when you fetch.) In other words, you can do what you want today (at least with Windows - haven't looked at gcc).
So, your proposal has some merit - but IMO the disadvantages outweigh the advantages.
ps: There's a good PDF from AMD describing x64 at
http://www.amd.com/us-en/assets/content_type/ white_papers_and_tech_docs/x86-64_wp.pdf. It goes into detail about the 16/32/64 bit modes and register usage. Your new mode is not provided in the hardware - the extra registers require the full 64-bit mode.