It makes sense to me that the gaming results are no different. Are there many games that even utilize multiple cores?![]()
P-Worm
Check this link out.
It makes sense to me that the gaming results are no different. Are there many games that even utilize multiple cores?![]()
P-Worm
Two FSB at 1333 MHz, and two dual-channel 1333MHz memory busses.
There's always the choice NOT to buy the 8-core Mac Pro. If you want to game, you're better suited with any of the cheaper Mac Pros, which would offer greater relative performance per dollar. Of course, your game selection per dollar will also be far better with a Windows PC.It shouldn't, but Apple in its infinite wisdom doesn't really give you many choices.
That's for 8 cores, 2 CPUs with 4 cores each, each CPU having its own bus for its 4 cores, which are on two dies. My point is entirely accurate.
(note however that coherency traffic will be *very* significant on an 8-core system)
Intel make up for it with good prefetchers in their cores, but these are less effective with more cores, and a high amount of cache. Barcelona has better prefetchers however, and larger cache, so it's probably a wash in the end. So AMD's greater memory bandwidth per core is what seals the deal.
your game selection per dollar will also be far better with a Windows PC.
How do you quantify AMD's problems with a NUMA architecture, though?
Speaking of which, Intel is demoing SS motherboards at this week's IDF in China, with at least some of them running 3.2 GHz quad-core Penryns.Perhaps. But one of the biggest missing elements for optimum 8 core performance among multi-threaded workflows is a 2007 Stoakley-Seaburg (SS) motherboard.
All this power in the 8-core Mac Pro and not that much software to take advantage of it...![]()
You're right - I read your post too quickly - I deleted my response.
How do you quantify AMD's problems with a NUMA architecture, though?
Because the AMD memory is connected to each socket, CPUs in that socket access the local memory through the memory controller. If the memory happens to be connected to the other socket, the serial HT bus is used to access the remote memory controller - adding latency.
On a 4 socket system, the sockets are connected in a ring. If the memory happens to be on the "far" socket, the memory access has to go across two HT links in series, adding even more latency.
AMD's architecture has nice numbers when you add them all up, but when you actually benchmark the 4 socket machines you see a pretty significant variability in runtimes due to the extra latencies.
The 8-core is not as fast as expected/hoped because of three things combined:
1. Tiger is suboptimal when it comes to parallel execution.
2. The memory bus is a slowing factor.
3. The software isn't parallelized enough to benefit from 8 cores.
Agreed?
By "Windows PC" I mean a Core 2 Duo/Core 2 Extreme desktop system--I should have specified. You don't see many gamers building Xeon rigs, for a very good reason. You can get superior gaming performance for a cheaper price by skipping the FB-DIMMs and the workstation motherboards.I assume that you're just ignoring boot camp as an option. I run boot camp marvellously on a second hard drive, with a seperate GPU, I could run it with the standard 7300, but I found it fairly easy to install an 8800 instead.
Barefeats speculates that the 8-Core Mac Pro maybe bottlenecked by the memory bus and also considers the possibility that Mac OS X Tiger may not be well optimized for the 8-Core Mac Pros.
It's always a chicken/egg problem. You can't really expect software companies to release versions optimized for eight cores before the hardware is released.
Get ready for the "I'm waiting for the REAL 8-core MacPro" posts...
Get ready for the "I'm waiting for the REAL 8-core MacPro" posts...
I don't think there is actually a major problem problem with 1333MHz FSBs for quad cores. Besides, it's simply too simplistic to do some division and say there is only 333MHz of equivalent bandwidth per core. For one thing, the unified L2 cache means it's more like 667MHz per L2 cache. In terms of cache coherency, the snoop filter does help with that. And you can't just say that Netburst started with a 400MHz FSB. Merom is a fundamentally different architecture and seems much more FSB bandwidth agnostic.That's for 8 cores, 2 CPUs with 4 cores each, each CPU having its own bus for its 4 cores, which are on two dies. My point is entirely accurate.
It seems like this is another rushed release from Apple. It had been a while since the Mac Pro saw an update, the 8 core was sort of out and they released it to be ahead of the competition.
Something tells me we'll see a lot of these types of releases by Apple in the near future. The main focus right now is on the electronic gadgets side of things. Unfortunetly, this is where there's a lot of money to be made. Mass consumers electronics. Apple TV, iPhone. The delays for Leopard is one example. They need to create some sense of expectation. Keep the favorable rumors going.
We'll see product updates in minor ways and the excuse software wise will be the pending release of Leopard. The coming months will be critical for Apple and with the looks of things they are focusing where there's money to be made and that's not computers.
As for where Apple is focusing their effort, unless you sit in their strategic planning sessions, your credibility is no better than your attitude.
AMD use Coherent Hypertransport (marketing name: Direct Connect) between the CPUs in their systems, and an on-CPU memory controller (dual-channel DDR2). Direct Connect is 8GBps per link currently, going up to 20GBps later this year. Registered DDR2 is available at 667MHz, maybe 800MHz now, so that's 10-12GB/s per CPU (theoretically).
That means in a 4 CPU (8 core) system you have 8 DDR2 memory controllers, which provides a boat-load of bandwidth. With Barcelona in a couple of months you'll have a 2 CPU (8 core) system with 4 DDR2 memory controllers, so not as good, but more compact (and better than accessing memory over the FSB).
Assuming the latter, and ignoring coherency traffic for an 8 core system:
AMD Barcelona: 21 - 25 GBps memory = 2.6 - 3.2 GBps per core.
Intel: 2x1333MHz FSB = 20 GBps memory = 2.6 GBps per core.
(note however that coherency traffic will be *very* significant on an 8-core system)
Intel make up for it with good prefetchers in their cores, but these are less effective with more cores, and a high amount of cache. Barcelona has better prefetchers however, and larger cache, so it's probably a wash in the end. So AMD's greater memory bandwidth per core is what seals the deal.