Childish insults are a sign of a weak argument. And yes, recompiling will not optimize for 64-bit. You know full well to optimize most of the apps will need new code.
I am sorry if I insulted you. It wasn't my intention. Its just rather clear to me that you have absolutely no clue about how compilers work or what difference between a 32 or 64 bit CPU is. And I find it very... well,
weird that - despite your obvious lack of education on the topic - you try to a argue in this thread, misleading people who maybe try to learn something from reading the posts here. Anyway, let me give you a brief explanation.
There are two differences which are vital here: the 64 bit CPU has more registers (very fast processor internal storage) and it can do computation on 64bit integers in one go (this is of no significance for most applications because they simply don't need it). In addition, the 64bit has some additional instructions which might make some things faster, but we will disregard it here.
Let us, again, for the sake of the argument, assume that the 32 bit CPU has 2 registers (R1 and R2) while the 64-bit CPU has 4 registers (R1, R2, R3, R4). The actual number of registers is of course much higher, but the idea is the same. Now let us assume you have the following code:
x = b*sin a + c*cos a
y = e*sin a + d*cos a
Now, the compiler must translate it into machine code. Let us also assume that trigonometric operations are expensive (might or might not be true on current architectures). Compilers are smart enough to detect this and the first optimisation step will be to reduce the number of such operations. I.e., the compiler will cache the result, creating an intermediate code like this:
f = sin a
g = cos a
x = b*f + c*g
y = e*f + d*g
Until here the optimisation is the same for 32 and 64 bit compiler. Now lets look which instructions the 32 bit compiler will produce. In following I assume that operations I write here can be performed by one instruction and each instruction must operate on at least one register.
We start with:
Code:
R1 = a
R0 = SIN R1
R1 = COS R1
Now we have a problem - we have stored the sin and cos in the two available registers, but we are out of the registers for the remaining stuff! E.g. we could do something like this:
Code:
R0 = R0*b
R1 = R1*c
R0 = R0 + R1
x = R0
but then we lose the cached results of the sin und cos (which were stored in R0 and R1) - and we must compute them anew for y. Another possible strategy would be to create new dummy variables (stored in RAM) sina and coda which hold sin a and cos a respectively instead of using the registers for it. Either way, we need additional trigonometrical instructions or additional memory.
A possible machine code would be though
Code:
R1 = a
R0 = SIN R1
R1 = COS R1
sina = R0
cosa = R1
# reuse the fact that R0 is sin and R1 is cos
R0 = R0*b
R1 = R1*c
R0 = R0 + R1
x = R0
# load sin and cos back to R0, R1
R0 = sina
R1 = cosa
R0 = R0*e
R1 = R1*d
R0 = R0 + R1
y = R0
in sum, we have 7 RAM load operations, 4 RAM store operations, 2 trigonometric operations and 6 arithmetic operations
Now, the 64-bit instruction set has two additional registers. So we can easily bypass the problem above:
Code:
R1 = a
R0 = SIN R1
R1 = COS R1
R2 = R0*b
R3 = R1*c
R2 = R2 + R3
x = R2
R2 = R0*e
R3 = R1*d
R2 = R2 + R3
y = R2
Here we have 5 RAM load operations, 2 RAM store operations, 2 trigonometric operations and 6 arithmetic operations.
As you can see, simply compiling to the 64-bit instruction set allowed us to eliminate 40% of expensive RAM access operations. So the appellation will potentially run faster when compiled to 64-bit.
Got it now?
In truth, the compiler will be even smarter, by putting the variables into SIMD registers and executing parallel instructions on them, but let's not go there now...