Hector:
I spent too much time looking this stuff up.
tests have shown that it shows very little benefit
Two points: first I don't believe that, and second, that was the G4 they tested, not the G5.
Point one: Powerlogix did some SDR vs DDR L3 testing for their upgrades and that PDF is still available. It shows one test (Cinbench for what its worth) that goes 7% faster going from 1MB SDR L3 to 2MB DDR L3, and in those circumstances Photoshop was boosted 4%. If there were no L3 at all, the performance impact would be larger.
http://www.powerlogix.com/downloads/SDRDDR.pdf
Point two: Evidence that L3 matters can be found by examining P4 vs P4EE tests, clearly that 2MB L3 is making a difference in some benchmarks, sometimes a lot of difference, and sometimes none at all. In Civ3 the L3 apparently provides a 16% kick in the pants, on a 3D studio test it can add 12%.
http://www.aceshardware.com/read.jsp?id=60000253
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=1965
There are countless other tests of the P4EE out there which you can find if you'd like.
i was pointing out the general lack of need for such a cache because the ram is fast enough and so is the system bus.
What do you mean "fast enough"? Perhaps you didn't pay any attention to what I said about latency, the silent enemy. I think you should mosey over to Apple's official PR page and ponder the performance scaling going from 2.0ghz to 2.5ghz:
http://www.apple.com/powermac/performance/
Take the "Bibble 3.1a" test for example: the 2.5 is 150% the speed of a P4, the 2.0 is 119% the speed, so (100+150)/(100+119) = 14.2% speedup. That's the best test Apple presented as far as I know, the Photoshop test of all things showed a (100+98)/(100+82) = 8.8% speedup, "Audio Plug-ins" showed a (100+180)/(100+159) = 8.1% speedup. Remember, those are all on a 25% clock speed boost. A hypothetical 3ghz 970fx would be
less than 18% faster at Photoshop than a 2ghz G5, on Apple's own test.
(Aside: note that Apple's G5 prices are in line with observable performance differences rather than clockspeed.)
Intel provides us a point of comparison: their 2.0ghz P4 vs the 2.53ghz P4, both Northwood cores with 512k L2. The 2.0ghz chip has a 400mhz FSB and the 2.53ghz chip has a 533mhz FSB, similar FSB scaling to what Apple has done (33% vs 25%, so Intel has a bit of an edge). Like the G5, the RAM on the P4's does not change ("note that we ran all of our CPU tests with PC800 RDRAM", on pg 6).
http://www.anandtech.com/showdoc.aspx?i=1615&p=1
18.3% on "Content Creation" which includes Photoshop.
27.3% on MP3 encoding with Lame (the accuracy of measurement is suspect).
19.7% on MPEG-4 encoding with Xmpeg.
20.8% on Adobe After Effects 5.5
22.0% on 3D Studio Max
19.6% on Maya
Feel free to do more math if you'd like.
So based on the combination of Apple's own evidence and tests performed by Anandtech, I think most people will agree that the 1.25ghz FSB and dual DDR 400 isn't doing the trick. With only 512k of L2 the G5 gets a lot of cache misses (depending heavily on the application of course) and each of those misses will require the G5 to wait on the main RAM. That'll put an end to performance scaling
real fast if the RAM is slow to respond, which is what we are seeing.
I claim that despite the massive bandwith, the G5 is bottlenecked on RAM
worse than the G4 ever was. Quite a claim, eh?
So repeat after me,
pleaseIBMgiveusondiememorycontrollers!.

Larger caches, be they L2 or L3, would help as well.