Re: Learn what you are talking about
Originally posted by sergeantmudd
I am so sick of everyone and their damn Power4 rumors. POWER and PowerPC are NOT the same architecture. The AIM alliance took the POWER architecture, took out all the old crud and added some stuff. The result is the PowerPC architecture. The architectures are very similar, but they are not the same thing. Now as for the Power4, it understands both POWER and PowerPC instructions. It does this the same way the Pentiums understand the x86 instructions, it breaks the instructions down and runs a more efficient internal RISC machine. The Power4 is NOT a native PowerPC chip.
According to IBM, the "POWER4 is a 64 bit PPC ISA chip," no matter what it does internally, so was the POWER3. That means it used the PowerPC Instruction Set Architecture. If you want to program for it you need to used the PPC ISA because that is how it talks to the world. The P4 and Athlon are both IA-32 ISA chips for the same reason. IBM abandoned the POWER architecture when the designed the POWER3, there may be some of the ISA left for compatibility though. In other words, if you wrote and compiled a 32 bit program to run on AIX on a 604, it would run, without any software compatibility on a POWER4 machine. You do have to recompile AIX4 64 bit binaries though to run on AIX 5. Therefore, if Apple created a motherboard, and did some minor changes to the root level of OS X, you could run Photoshop 7 on a POWER4 based machine.
As for the sahara G3, actually the PPC750Fx, IBM lists it as availible in up to 1GHz, not 2 GHz. I don't know what it would take to get it to 2 GHz. As for it outperforming the G4, it all depends on how it is used. In scalar integer performance, I believe the two chips are pretty near equivalent. The mult/div latency on the MPC745X is slightly longer but has the same throughput. The latency oon add/sub is still one cycle. The scalar FPU when using single precision numbers is also similar. Longer latency on the G4 but the same throughput (5:1 for the 745X, vs 3:1 for the 750Fx), remember if you can keep the functional units fed, on heavy, independent calculations the throughput has more bearing on performance. For double precision the G4 outclasses the G3 significantly, since there is not change in latency:throughput. On the G3 using the Fused-Multiply-Add (FMA), which is used for all DP adds, multiplies, and add-multiplies, if the multiplier is not 1 the latency throughput goes from 3:1 to 4:2, in other words the FPU slows down by a factor of two. Incidentally the POWER4 FPU is a 6:1 unit. Then there is the Altivec VPU, which works on Fixed point vectors, and single precision FP vectors Both chips have similar size L1 cache, and on die L2 cache. The PPC750Fx has 512K, and the MPC7455 has 256k. Additionally the 7455 allows for up to 2MB DDR L3 cash running at up to 533 MHz, effective. Neither chip supports DDR system bus access. The PPC750Fx officially supports a FSB of ~200MHz vs. ~133MHz for the 7455 (though most will work on a faster bus). The PPC750Fx, does not have nearly as complete SMP support as the MPC7455.
Where does this leave us? Well if you write code the uses data larger than 256K, and smaller than 512K, the PPC750Fx will most likely be faster than the 7455 at the same clock speed, even with AltiVec. This is due to the L3 cache and main memory latency being so high >38 cycles. However, if the data is less than 256K the 7455 will hold its own, and exceed the 750 for double precision and vector computations. If the data is greater than 512K but less than 1 or 2 MB the 7455 will clean the 750's clock. Larger still data sizes will tend to favor the 7455 because of its deeper cache structure. The POWER4 of course will win outright because it has twice as many fully functional fixed anf floating point units per core as the PPC750, it has two cores per die, and 1.4MB of L2 cache on die, which looks to the cores to be ~467K in size, for lower latency. The L2 throughput is about 100GB/sec. It also has a huge DDR L3 cache.
In conclusion the 750Fx will not smoke the 7455. In fact since its memory bus isn't really any higher bandwidth and its caches are shallower, it will be slower Mhz for MHz than the 7455 for most operations that don't fit in the 256K<X512K data sizes. Also multiprocessing is more efficient on the 7455, meaning that a much greater boost in performance can be achieved by MPing a 7455 machine. Therefore, if Apple really wants significantly improved performance from an IBM chip they need one that has a DDR FSB, a high throughput FPU, good SMP, 512K L2, and capability to used an L3 (no matter what you do a 2MB L3 will be lower latency than 512MB-1GB main memory, even at the same effective clock speed. Can Apple get IBM to do this? Most likely, if they are wiling to pay enough.