Re: Re: "64-bit optimized"
Originally posted by aasmund
We must live in different universes. 64bit not only affects mathematical operations but it increases the number of instructions, the register size and other things all these things will obviously mean a new optimization (just look at what Intel is doing with their P4 optimized compilers). Hell, it's a whole new architecture, both recompilation and optimization will offcourse yield significant performance enhancements.
Just by recompiling the OpenBSD kernel on my Athlon gives me 1.5-3% performance increase (that is w/o any optimization)
So take your crap and stuff it back into your chest,
Regards.
You are quite incorrect.
The RISC instruction set is always 32-bits wide. That is not changing with PPC-64 (the PPC instruction set, as is often pointed out, was originally designed to use 64-bit memory addresses and then whittled down to using 32-bit memory addresses and registers for consumer PCs at the time).
As for having wider registers: that's true, but these are not vector registers so having wider registers doesn't mean you can put twice as much data in there and use it twice as fast. To add 1+1 you will use the exact same processor resources on a 64-bit PPC as you would on a 32-bit PPC. If you have two 32-bit ints to manipulate, each will go into its own 64-bit register and be operated on; they can't both be put into the 64-bit register and manipulated in half the cycles. Now, if the 970 also doubled the width of the Altivec registers, you would have a point (twice as wide registers there means twice as many pieces of data can be manipulated at once), but that is just not the case.
In short, 64-bit PPC instructions at the core buy you two things: 1) much larger addressable memory space (which allows for machines with more than 2GB of memory, although this doesn't mean Apple will provide such anytime soon ... it also allows for addressing of flat files larger than 2GB in a more straightforward manner than is currently employed), and 2) direct manipulation of much larger integer values (previously you'd have to manipulate large ints as two 32-bit ints with special cleanup code or just treat them as floating point numbers).
Now, the larger applications for those two advances are fairly large, and, yes, will only really be known when the consumer software space realizes that they can now ply with 64-bits and still have someone buy their software. Today, these features are incredibly useful for databases and other specific server-ish applications, so you'll hear a lot of people saying that this will bring high-end databases to the consumer (which is ... well, not really a valid conclusion; it removes one roadblock but the main roadblocks of cost and demand remain).
Finally, regarding per-processor optimizations: that is more due to the specific characteristics of the processor (pipeline characteristics, memory access, etc) than to specific instructions. The PPC instruction set will not change much at all between the G4 and the 970; just the 64-bit specific instructions will be added. Between your P3 and P4 or "generic Pentium-class" and Athlon there are both more efficient instructions added (this is a characteristic of a CISC system) and radical processor data-flow changes that allow gcc or whatever compiler you use to make the code run much more efficiently when it knows the specific processor.
Now, yes, as apps are recompiled optimized for the 970 we
will see better performance because the apps will begin to actually take advantage of the characteristics of the 970. I view this the other way around: we Mac developers have had to adapt to an unnaturally-constrained system bus for the G3 and G4 lines, and will be able to return to more "universal" design patterns with the 970.
For instance, it is often more efficient to calculate a bunch of loop constants inside the loop with a G4 instead of calculate them once and pull them in from memory as needed because calculating them may take a dozen cycles but pulling them in would take 50-100. You won't find that in any design patterns book (in fact, you'll find just the opposite), but as a Mac programmer you just have to do it or you'll have crappy performance.
So, yes, we will see a dramatic increase in performance right up front with G4-designed apps running on the 970, and we will see more improvement when those apps start "optimizing" for the 970 environment. But, that has nothing to do with 64-bitedness; it has everything to do with the
other characteristics of the PPC970.
The specific effects of 64-bitedness will likely be small right up-front, and escalate much more slowly. To take advantage of 64-bits will take much more design, much more thought and much more innovative eureka moments, not just a recompile on an optimized compiler.