Re: More 64 vs 32...
Originally posted by Fender2112
The G4 can process 5 instructions per cycle. That's 4 in the pipeline and 1 result. The 970 handles between 8 and 12 depending on what's in the pipeline with one of them being a result. I took this to imply that the 970 can handle about twice as many instructions per cycle. With the 970 starting at 1.8 GHz and the G4 at 1.42 GHz, we get somthing like this:
970: 8x1.8= 14.4 Gig instructions per second
G4: 5x1.42= 7.1 Gig instructions per second.
This implies to me that the 970 will be about twice as fast as a G4 and this speed has very little to do with 64 or 32 bit numbers.
Please toss me a life jacket if I missed the boat on this.
There's some mixup in what you wrote.
The G4e (7450+ series) can dispatch up to 3 instructions + a branch and retire 3, the 970 can dispatch up to 4 instructions + a branch.
If you look closer the instructions are not the same, the G4e takes raw PowerPC instructions, whereas the 970 breaks some of the PowerPC instruction in what IBM calls IOPs (for instance the lwzu instruction -Load Word and Zero with Update- will require two IOPs).
But you can still consider that the 970 will dispatch more instructions, another reason is that the 970 has more executions units that will fill up more slowly -2 Floating Point Units vs 1-, -2 Load/Store Units vs 1-. Excepted for integer computations: -2 Complex Units vs 1 Complex Unit + 3 Regular Units- and to some extend Altivec, where 3 of the 4 Units are behind the same Issue Queue, like in the older 7400/10.
The pipeline length has increased (16/25 stages -min/max- vs 7/12), this introduces some benefits: the 970 has more opportunities to take advantage of out of order execution and the frequency of the core could soar (2.5 GHz by Q1-2004 3 GHz by Q3-2004?), on the other side special care must be taken to avoid pipeline bubbles and misspredicted branches (branch prediction logic is way more complex in the 970), data dependencies could also bring pipelines to stall more often.
The instruction throughput should be on the rise, a two fold increase is probably optimistic: instructions process datas and those must be available wherever they result from a previous computation or come from the memory subsystem, they do not just fly straight through the pipelines.
The caches have evolved, the L1 instruction cache is now twice as big 64 KiB (32 previously) but is direct mapped, this could be a drawback if your app uses a lot of small pieces of code and jumps from one place to another.
The L1 data caches have the same size: 32 KiB, but not the same layout: 2 way set associative vs 8 way on the G3/G4 (8 way is supposed to be more efficient), the size of the cache lines has probably changed, they could be 128 bytes wide, this is huge compared to 32 bytes in the other PowerPCs.
The L2 cache is twice as big 512 KiB (the 7457 also has this capacity).
The 970 is supposed to handle more data streams, outstanding reads and cache misses, this will even more increase it's perceived bandwidth.
The PowerPC 970 has a proven core since it is in use in the POWER4+.
The 2 Load/Store Units coupled with the frond busses will benefit many apps, the 2 FPUs and the 64 bits registers pave the way to new workstation class scientific software.
AltiVec will probably be on par with the 7400, excepted that memory bounded apps will fly on the 970.
The integer part of the core could be slightly slower (at the same clock speed) but here a gain with these two LSUs, the superior branch prediction, the OOOE capabilities... it's hard to tell.
The 970 will be faster, especially on double precision floating point math (research, labs...), memory related tasks (most of the multimedia stuff), and of course 64 bits computations (when you have to deal with those).