Originally posted by barkmonster
...
Picture This if the L3 cache remains :
CPU : 1.4Ghz
L1 : 32K + 32K @ 1.4Ghz
L2 : 256K @ 1.4Ghz
L3 : 2Mb @ 700Mhz (DDR 350Mhz x 2)
FSB : 166Mhz
RAM : 333Mhz (DDR 166Mhz x 2)
The L3 would still offer over twice the throughput of the main system RAM and would more than likely add a significant performance boost. Also, I'm sure I've read that the 7470 uses a 512K L2 like the chip in the new iBook. Barefeats have done a shoot out between the eMac, iMac, TiBook and iBook. In some of the tests the iBook won by 15 - 28%
I think if we get a 1.4Ghz G4 with a faster FSB and DDR we'll be looking at some pretty powerful systems from Apple very soon.
Quite right, I wrote a little Java code that performs some matrix math on two random square matrices. Now I know Java is not the best code to do math on, and it is unfriendly to PowerPC chips (no FMA among other things), but I know it best, and the code is simple enough I don't need a compiler on each machine that I want to test. Anyhow log story short, for a certain matrix size, I don't remember exactly what size. The code runs about 20% faster on the 700MHz iBook, than on the 800MHz TiBook. Once you exceed the 512K L2 though the G$ wins easily. Also remember that double FP multiplication on the 750 has a base latency of 4 and a throughput of 2, on the MPC7455 it is 5:1, So if you can keep the CPU fed, one should be able to do a lot more matrix multiplication in a given amount of time on a MPC7455 than on the PPC750FX. I haven't looked into cache latency for the 750FX, but on the MPC7455, the latency for data retrieval from the caches are as follows (add one cycle for FP data).
L1 3 cycles,
L2 9 cycles, including L1 miss and L2 find,
L3 ~38 cycles, including L1 miss. The L2 and L3 lookups are performed simultaneously. Now I can't imagine even DDR memory being anywhere near this, for one looking up something in 2MB of memory is a lot simpler that finding it in 2GB of memory. Also since the L2 lookup is 4 times faster than the L3 if one can keep the data in the L2 they can get a serious speedup in the program execution. Also if you can do a non-dependent multiply & add, try ordering your program top take advantage of the FMA functions in the PPC ISA.
As for the XServe chipset. Since I have never seen any of the tech specs on it, I don't know if it is strictly limited to SDR FSB. My guess is that Apple made it DDR capable. Since the MPC7455 won't do DDR you can't tell by the XServe implementation whether or not the chipset is capable of full DDR implementation. If you know of the definitive answer please post it and the reference. Hell given the fact that Apple, at least for a while, showed interest in using the POWER4, the XServe chipset may even be able to handle a sing chip POWER4 module if one were ever available.