Re: Re: Re: Re: Altivec isn't optimization
Originally posted by gopher
I see your point but it is a faulty one in that the core of the Pentium IV has the problem of more stages meaning code branches that have errors are more likely to stall on the processor than on the G4 which has nearly three times less stages. A G4 will be finished with many tasks and bottlenecking much less on errors than a Pentium IV. Not to mention the G4 is a true RISC instruction set, meaning processes will be optimized even further even at the core, without the help of specialized instructions. Mhz doesn't give you the entire picture of even the core processor functions. Once you start considering stages, the G4 does more faster with less overhead to obstruct it.
Apparently, it is not a faulty one, benchmark after benchmark used to measure core performance seem to prove that the Pentium 4 on the core level is indeed faster than the G4. This is all easily reflected in the difference in performance you see in unoptimized cpu intensive programs (games, scientific apps. engineering apps, HPC).
More stages means a greater penalty for a mispredict, thats assuming the processor will mispredict, the Pentium 4 has 4096 entries in it's Branch History Table, twice the number of entries the G4 has, combine that with the trace catch concept used by the Pentium 4 and you'll find that the Pentium 4 is far less likely to mispredict a branch than the G4 in the first place. It's ironic really, the current G4+ itself has far more stages than original G4, and yet by all accounts, it definitely outperforms the original, makes you wonder if pipeline is everything Apple sets it out to be.
Really, from a logical standpoint theirs only two aspects to processor performance
1) Instructions Per Cycle or IPC, commonly referred to as "efficiency", it's basically the number of instructions a cpu can execute and retire in one clock cycle. It's determined by a varierty of aspects (pipeline, branch prediction, fsb, latency, cache size etc etc).
2) Cycles Per Second or Hz, the number of cycles a processor can go through in one second, nowadays measured in millions of cycles (MHz) or billions of cycles (GHz).
The Pentium 4 can issue and retire up to 2 instructions per cycle (a typical x86 instruction is 1.5 micro-ops 3/1.5=2) and can go through up to 3066 cycles every second.
A G4 can issue and retire up to 3 instructions per cycle and can go through 1450 cycles every second.
Of course, not all instructions are equal and sustained values vary depending on the program so this is where benchmarks come in.
On the contrary the G4 isn't a "true" RISC instruction set, it actually has a relatively large instruction set compared to "true" RISC chips like the Alpha 21164.
Another contrary point, RISC chips actually put a greater burden on the software since the CPU is capable of executing only a very limited set of instructions.
Of course, current x86 chips are to some degree RISC. Both the Pentium 4 and Athlon have only one CISC external layer.