The increased registers help all user-ring code, too, so long as the code is compiled to be aware of AMD64 (even in 32 bit mode). If the code is not compiled to be aware of AMD64, then it still helps somewhat because of register renaming - internally there are more registers than appear to the architecture, and they are assigned an architectural meaning depending on what is needed. (So, for example, several of them may correspond to the DX register, each for a different process). Since AMD64 has more architectural registers, it tends to have more physical registers. In pure 32-bit mode, some of these are typically unavailable. But, definitely, the 20% improvement comes from re-compiling with amd64 turned on, in 32-bit. (My experiments were all done with gcc).