Originally posted by gopher
If G4s are only single instruction precision, I ask you, why are the RC5 numbers so much higher on the G4?
RC5 results are so much higher on the G4 because RC5 is heavily optimized for the matrix calculations that are AltiVec's bread and butter. RC5 works on AltiVec because it uses vectorized single-precision fp operations. It is not an accurate measure of real-world performance any more than SETI@home or the random Photoshop filter. (Which are among the only things the G4 performs well at these days.) It is, however, a reasonable measure of Photoshop, FCP, and SETI@home performance, if you want to look at it that way.
Floating point precision, single or double, refers to the word length of a floating-point instruction. A single-precision fp word (on most processors including x86 and PPC) is 4 bytes (32 bits). Double-precision fp words are 8 bytes (64 bits). (Bits = digits)
Again I find hard to believe they are only single instruction precision when this quote from Apple's developer page says:
"Motorola's AltiVec Technology, embodied in the G4 processor, expands the current PowerPC architecture through addition of a 128-bit vector execution unit that operates concurrently with existing integer and floating-point units. This provides for highly parallel operations, allowing for simultaneous execution of up to 16 operations in a single clock cycle. This new approach expands the processor's capabilities to concurrently address high-bandwidth data processing (such as streaming video) and the algorithmic intensive computations which today are handled off-chip by other devices, such as graphics, audio, and modem functions.The AltiVec instruction set allows operation on multiple bits within the 128-bit wide registers. This combination of new instructions, operation in parallel on multiple bits, and wider registers, provide speed enhancements of up to 30x on operations that are common in media processing."
That's 16 operations in a single clock cycle. Where is that single precision? And with dual processors it makes it that much more formidable.
Yes -
up to 16 single-precision operations in a single clock cycle. The paragraph you quoted mentions nothing about floating point
precision. As I said, the maximum word length AltiVec can handle is 32 bits.
128 / 32 = 4 (the maximum number of single-precision [32-bit] fp ops / sec)
128 / 16 = 8 (the maximum number of half-precision [16-bit] fp ops / sec)
128 / 8 = 16 (the maximum number of quarter-precision [8-bit] fp ops /sec, and the source of the marketing stat that you just fell for)
I would suggest YOU read it carefully. Stop swallowing the marketing hype and think for yourself.
Vector calculations make sense. What good is high speed if the operations are so complex it takes 5 times longer to learn them?
Vector calculations do make sense, for some tasks. I suggest you get to work convincing developers why they should devote precious time and energy re-writing the necessary portions of their code to support a platform that comprises a miniscule portion of all markets outside the graphic design and audio production realms. Outside these niches, it would help Apple
so much more if the PPC were simply able to run standard, platform-independent code even close to as fast as x86 (or any other CPU family) can.
How is parallelism single precision? You aren't making sense. Because that's what you are saying the G4 is doing.
No I'm not. You are confusing parallelism with precision. Parallelism refers to processing chunks of data concurrently, which AltiVec does do. AltiVec is, however, incapable of processing these chunks of data concurrently if they are double-precision (>32 bits long). Double-precision fp work (and non-vectorized single-precision fp work) is passed off to the G4's FPU, which sucks massively.
You can get as much precision as you like when the processor is running things in parallel.
Nope. It doesn't matter if the vector processor is 128-bit or 1024-bit, if it only supports 32-bit words, it will only give you 32-bit precision. It's just that a 128-bit VPU will give you four chunks of 32-bit data per cycle. As you know, adding 1,000 (4 digits) 4 times will not give you a number that is 16 digits long. I HOPE you knew that, anyway.
Alex