View Full Version : 8-Core Mac Pro Benchmarks
MacRumors
Apr 16, 2007, 10:42 AM
http://www.macrumors.com/images/macrumorsthreadlogo.gif (http://www.macrumors.com)
Barefeats (http://barefeats.com/) has published performance benchmarks for Apple's latest 8-Core Mac Pro.
The initial report (http://barefeats.com/octopro1.html) compares the 8-Core to the Quad-Core Mac Pro in Cinebench, GeekBench, Photoshop CS3, Aperture 1.5 and Quicktime 7.1.5 Exports.
A second report (http://barefeats.com/octopro2.html) compares gaming frame rates between the two machines. They tested Doom 3, Quake 4, Halo, UT2004, World of Warcraft and Prey.
The 8-Core Mac Pro came out up to 40-55% faster on some tasks, such as Cinebench 9.5, GeekBench, and Quicktime Export speeds, but provided little advantage in the limited Photoshop CS3 and Aperture testing. The 8-Core also proved to be no faster across the board in the Gaming tests.
Barefeats speculates that the 8-Core Mac Pro maybe bottlenecked by the memory bus and also considers the possibility that Mac OS X Tiger may not be well optimized for the 8-Core Mac Pros.
awesomebase
Apr 16, 2007, 10:47 AM
How about performance per $? That would be more useful since the 8-core Macs are way more expensive than the 4-cores...
johnee
Apr 16, 2007, 10:47 AM
Obviously they unveiled the 8 core for NAB, they wanted to highlight "all 8 cores running at 100%" which gave them a 3x speed up.
P-Worm
Apr 16, 2007, 10:48 AM
It makes sense to me that the gaming results are no different. Are there many games that even utilize multiple cores? :confused:
P-Worm
johnee
Apr 16, 2007, 10:50 AM
It makes sense to me that the gaming results are no different. Are there many games that even utilize multiple cores? :confused:
P-Worm
I guess a pro machine shouldn't be used for games ;)
andiwm2003
Apr 16, 2007, 10:51 AM
has anybody tested the new fc pro? one would expect that that uses the newest machine to its fullest.
brad.c
Apr 16, 2007, 10:53 AM
While the gaming benchmarks are interesting in themselves, I gotta wonder who would consider this as a serious part of the purchase decision.
Then again, if you ARE buying this for gaming, you'd be a girly-man for getting less than two 30" monitors.
Darkroom
Apr 16, 2007, 10:54 AM
i'd be curious to see benchmarks for Adobe CS3 when it's released... Photoshop CS3 Beta is alot better on my Intel iMac than the crashtastic CS2 programs, but it's still a bit sketchy, and therefore not something worth benchmarking (IMHO).
twoodcc
Apr 16, 2007, 10:55 AM
yeah, i bet the new fcs will take advantage of the 8-cores.....or at least i sure hope so
P-Worm
Apr 16, 2007, 10:56 AM
I'm guessing that one of the best benchmarks for something like the 8 core would be a render test using some 3D app like Blender or Maya. Rendering CG is supposed to be purely CPU and the graphics card and memory has little to do with it. Am I right on this?
P-Worm
tuartboy
Apr 16, 2007, 11:00 AM
It's worth it to note that gains in Quicktime exporting were nonexistent for single exports. The recorded gains were when performing 6 simultaneous exports. Unless you are forced to export 6 things at once, I would not chock this up as a win for the ocho.
840quadra
Apr 16, 2007, 11:13 AM
The memory bandwidth per core is really not too good (really poor actually). I would have expect much more performance jump out of an 8 core system, but it seems to be choked by poor memory bandwidth!
We did run the "test-compute-speed" with digLloydTools (DLT) on the 8 core. Its aggregate rate was 1204MB/sec (versus the 4 core's 601MB/s). That says that if the task is pure CPU, the 8 core is twice as fast as the 4 core. But if your task has to do a lot of interaction with memory, the advantage drops almost to nothing -- as we saw with Photoshop CS3 and Aperture.
I wouldn't expect anything big from FCP due to this fact, as that is also memory and disk intensive.
ddubbo
Apr 16, 2007, 11:27 AM
I suppose it may contribute a significant part to Leopard delay - optimizing it to Quad - core processors
commander.data
Apr 16, 2007, 11:30 AM
There is a problem with the chipset. A 1333MHz FSB should provide plenty of bandwidth for a quad core chip even if it is dual die, but FB-DIMMs seem incredibly inefficient and the first generation memory controller probably isn't well optimized either. I thought Anandtech found that a dual channel DDR2 800 setup with Conroe and 1066MHz FSBs actually gets more raw bandwidth than a quad channel Woodcrest setup with 1333MHz FSBs. Intel's upcoming Stoakley platform should correct some of this with a 2nd generation memory controller and a snoop filter optimized for quad cores rather than dual cores. I believe Intel was claiming 5% improvement with the same chips at CeBit using the new chipsets.
dantehicks42
Apr 16, 2007, 11:31 AM
It seems like this is another rushed release from Apple. It had been a while since the Mac Pro saw an update, the 8 core was sort of out and they released it to be ahead of the competition.
Something tells me we'll see a lot of these types of releases by Apple in the near future. The main focus right now is on the electronic gadgets side of things. Unfortunetly, this is where there's a lot of money to be made. Mass consumers electronics. Apple TV, iPhone. The delays for Leopard is one example. They need to create some sense of expectation. Keep the favorable rumors going.
We'll see product updates in minor ways and the excuse software wise will be the pending release of Leopard. The coming months will be critical for Apple and with the looks of things they are focusing where there's money to be made and that's not computers.
kalisphoenix
Apr 16, 2007, 11:33 AM
*yawn* What I want to see is a comprehensive benchmark suite that would show me precisely how fast I could convert 16,000 FLACs to ALAC.
scottlinux
Apr 16, 2007, 11:44 AM
Folding @ Home? Calculate Pi to a ka-zillion digits? LAME compression? Blender rendering? Come on, I want some real benchmarks...
guzhogi
Apr 16, 2007, 11:45 AM
I wonder how much of the bottleneck is b/c of hardware and how much b/c of software.
Something I'd be interested in seeing (but probably will never see) is someone building a whole new OS from the ground up. No use of any pre-existing code library, nothing. I'm sure a lot of the libraries today haven't changed much since the 1980's. Back then, we didn't have quad-core procs, 64-bit procs, graphics cards in SLI mode, nothing like that. I know, I know this will take years and lots of money to do, but would be interesting to see nonetheless. And please, don't say things like "There'd be a lot of regressions" or $h!t like that b/c I know.
kalisphoenix
Apr 16, 2007, 11:53 AM
I wonder how much of the bottleneck is b/c of hardware and how much b/c of software.
Something I'd be interested in seeing (but probably will never see) is someone building a whole new OS from the ground up. No use of any pre-existing code library, nothing. I'm sure a lot of the libraries today haven't changed much since the 1980's. Back then, we didn't have quad-core procs, 64-bit procs, graphics cards in SLI mode, nothing like that. I know, I know this will take years and lots of money to do, but would be interesting to see nonetheless. And please, don't say things like "There'd be a lot of regressions" or $h!t like that b/c I know.
The problem that I see (even though I agree with you completely) is that technology (and business) are developing at so fast a pace that the man-hours spent creating a completely new OS would only create a vaporware spiral.
A tortoise and a hare are having a race. The tortoise gets a 100-yard headstart, and then the hare takes off. However, in the time that the hare takes to cross those 100 yards, the turtle has gone another 25 yards. Then the hare has 25 yards more to run before he can catch up with the turtle -- and when he gets to the 125-yard mark, the turtle has gone another 6 yards or so... and so on.
It'd be like that, except the hare would get the head start and the turtle couldn't even dream of catching up :)
We got a lot of the fundamentals of software done while technology was advancing fairly slowly. At this point, it's just basically impossible.
The next major "clean" OS will be written by computers.
Redneck1089
Apr 16, 2007, 12:05 PM
It makes sense to me that the gaming results are no different. Are there many games that even utilize multiple cores? :confused:
P-Worm
By the end of the year, yes, there will be games that utilize multiple cores. For example, Crysis will...
Redneck1089
Apr 16, 2007, 12:10 PM
When I try to click on the links to take me to the benchmarks it tells me it's not available. Is the site down right now, or something? :confused: :confused:
iSee
Apr 16, 2007, 12:10 PM
How about performance per $? That would be more useful since the 8-core Macs are way more expensive than the 4-cores...
Actually, the price per core of an 8-core Mac Pro as compared to a 4-core is pretty decent. The 8-core Mac Pro 3.0GHz costs only $699 more than the 4-core 3.0GHz version. That's only about $175 per additional 3GHz core!
Folding @ Home? Calculate Pi to a ka-zillion digits? LAME compression? Blender rendering? Come on, I want some real benchmarks...
Ha, that's the problem with an 8-core system--not a lot of general use software will really take advantage of it. Still, don't most of us run more software at the same time than we used to?
clintob
Apr 16, 2007, 12:13 PM
It's been said on here 100 times (and I too have been guilty of the hoopla) but we need to stop obsessing over processors. The truth of the matter is that 99% of us never even come close to 100% usage. I use "pro" apps all day long and it just never happens.
You'll be MUCH better served to load up on extra RAM than to spend the money on extra processors. My Dual G5 is equal to my Intel Quad in virtually every test I've done (except for video encoding, which does use multi-core processors effectively). Moving a Quad Xeon from 2GB of RAM to 4GB will give you a much more tangible everyday speed increase than going from Quad Zeon to 8-core.
The sole exception to this is major encoding / transcoding of video, so unless you're doing that save yourself the money and resist the urge to pony up for more processor power that you'll never touch. RAM and bus speed are WAY more important.
Sorry, lecture over.
milo
Apr 16, 2007, 12:37 PM
Any Logic pro benchmarks anywhere?
KindredMAC
Apr 16, 2007, 12:40 PM
I can't believe the people that are shocked and upset by the lackluster performance of the Octo-Core Mac Pro....
Look at the numbers of the Mac Pro versus the Dual Core G5 PM's... The Dual Core G5 PM's are still little power houses that can easily go head to head with a Quad Core Mac Pro and come in a close second most of the time.
As for gaming.... I play games on my Power Mac G5. Processing power isn't really the key. I've noticed every time I put more RAM or a newer Graphics Card in my 2.0 Dual Core PM that my games responded the most. What I want to see, instead of games maximized for 8 Core processing, is multi-display support. If I am playing a game like Call of Duty 2 on my home system, which has 3 monitors set up in extended desktop, I want to be able to use my side monitors as peripheral vision, so I don't need to keep sweeping my mouse back and forth to see to my 9 and 3 o'clocks.
Just wait for applications to open up more to these mega multi-core machines.... however, aren't we still waiting for 64-bit support from this same app makers???? Hmmmmmm....... ;)
englishman
Apr 16, 2007, 12:41 PM
It's been said on here 100 times (and I too have been guilty of the hoopla) but we need to stop obsessing over processors. The truth of the matter is that 99% of us never even come close to 100% usage.
True - pity the punters get seduced by these silly server processors - give me a single 2.66 core 2 duo in the mac pro - and it would be really affordable and get nearer to mass market
liv4Mac
Apr 16, 2007, 12:46 PM
It's been said on here 100 times (and I too have been guilty of the hoopla) but we need to stop obsessing over processors. The truth of the matter is that 99% of us never even come close to 100% usage. I use "pro" apps all day long and it just never happens.
You'll be MUCH better served to load up on extra RAM than to spend the money on extra processors. My Dual G5 is equal to my Intel Quad in virtually every test I've done (except for video encoding, which does use multi-core processors effectively). Moving a Quad Xeon from 2GB of RAM to 4GB will give you a much more tangible everyday speed increase than going from Quad Zeon to 8-core.
The sole exception to this is major encoding / transcoding of video, so unless you're doing that save yourself the money and resist the urge to pony up for more processor power that you'll never touch. RAM and bus speed are WAY more important.
Sorry, lecture over.
Talk for yourself buddy and I mean only for yourself. Sorry, But what about rendering using Maya, Strata etc. and also using pro music sofware like Logic or Cubase. Some plugins (VST, AU, RTAS) uses a lot of processing power.
So don't asume that 8 cores are not needed just because your apps will probably not use them. There are many apps that will take advantage of 8 cores
EagerDragon
Apr 16, 2007, 12:46 PM
Sounds like all new computers will need 2 inch thick Lead enclosures to gain sufficient bus and memory speed.
deputy_doofy
Apr 16, 2007, 12:54 PM
It's been said on here 100 times (and I too have been guilty of the hoopla) but we need to stop obsessing over processors. The truth of the matter is that 99% of us never even come close to 100% usage. I use "pro" apps all day long and it just never happens.
It seems that Intel, as a company, has OCD. First, it's "let's get the MHz or GHz as high as humanly possible" and now it's "let's create a 100-core chip - no, 100 billion-core chip."
How about a more efficient everything and slow down on the multiplication of cores...
brooker
Apr 16, 2007, 12:58 PM
The 8-Core Mac Pro came out up to 40-55% faster on some tasks, such as Cinebench 9.5, GeekBench, and Quicktime Export speeds, but provided little advantage in the limited Photoshop CS3 and Aperture testing. The 8-Core also proved to be no faster across the board in the Gaming tests.
the links seem to be down, but...
1st off, aren't most games GPU dependent? Would those games run just as fast if the machine was doing some video transcoding in the background?
Where are the benchmarks for parallel workflow? Could the Ocho provide the same Aperture performance while also matching Quicktime export numbers? or would core-affinity issues prevent this?
more real-world tests needed. Doesn't anyone else with benchmarking apps have an 8xMP yet?
BenRoethig
Apr 16, 2007, 12:58 PM
I guess a pro machine shouldn't be used for games ;)
It shouldn't, but Apple in its infinite wisdom doesn't really give you many choices.
PinkyMacGodess
Apr 16, 2007, 01:01 PM
I visited someone at Arecibo last year and noted that they had a mac server there and asked what they were using it for. I was surprised to see it in a rack of Sun and custom machines...
They were trying to get the server to cruch numbers but were having problems with the speed through the server. He said that they were surprised at how slow the crunching was when compared to unix based boxes with slower processors... They were working with 'someone at Apple' to find out what the issues were but firmly believed that it was in the software, not the hardware...
It makes me wonder now if Apple will pay a price for delaying Leopard in favor of the 'toy' iphone... I for one was ready to buy a new MBP with Leopard on it but now will wait... It makes no sense to buy a MBP and turn around and have to buy Leopard in a few months...:mad:
PinkyMacGodess
Apr 16, 2007, 01:04 PM
It shouldn't, but Apple in its infinite wisdom doesn't really give you many choices.
It makes me wonder how serious the issue is and what, if anything, Apple has come up to address the slowness issue in my previous post...
I see the delay in Leopard (which may have been discussed elsewhere) as a serious blow to Apple's credibility, especially if the speed fix is wrapped into Leopard.
mklos
Apr 16, 2007, 01:05 PM
It makes me wonder now if Apple will pay a price for delaying Leopard in favor of the 'toy' iphone... I for one was ready to buy a new MBP with Leopard on it but now will wait... It makes no sense to buy a MBP and turn around and have to buy Leopard in a few months...:mad:
And you absolutely need Leopard because? Why does everyone think they absolutely need the latest and greatest all the time? You don't even know whats going to be included in Leopard, yet you still want it. I myself would rather buy my own retail copy. Then its my copy and its not attached to the computer it came with, but I guess thats just a smart mans thinking! :sighs:
What would really hurt Apple's creditibility is if they delayed the iPhone which is going to 100x more hype than Leopard will.
localoid
Apr 16, 2007, 01:14 PM
Any Logic pro benchmarks anywhere?
Good question. Interestingly enough, the music and audio section of Apple's Mac Pro performance (http://www.apple.com/macpro/performance.html) page doesn't show result (yet) for the 8-core (while the other test do.)
But, it's really not a complex or in-depth audio "benchmark" -- it just shows the "number of concurrently playing Platinum Verb reverb plug-ins." Still, having this comparison data (for the 8-core) would be better than none at all.
dontmatter
Apr 16, 2007, 01:21 PM
Dissapointing to see that the long awaited photoshop CS3 isn't able to really take advantage of more cores. It'll be a long time till the next version of photoshop.
milo
Apr 16, 2007, 01:32 PM
And you absolutely need Leopard because???? Why does everyone think they absolutely need the latest and greatest all the time????
I need leopard for the 64 bit support.
I run logic with huge orchestral sample libraries. Current OSX/Logic only allows about 3 gigs of samples loaded up, with 64 bit support the limit is moved much higher. The AU plugin has been ported to 64 bit support, so at this point the weak link is OSX and Logic.
So yes, I am currently being held back by the limitations of 10.4, and the workaround is to run multiple computers (or get a Vista machine).
Does that answer your question?
TheFuzz
Apr 16, 2007, 01:44 PM
Dissapointing to see that the long awaited photoshop CS3 isn't able to really take advantage of more cores. It'll be a long time till the next version of photoshop.
~18 months, like usual.
Sunrunner
Apr 16, 2007, 01:57 PM
It makes sense to me that the gaming results are no different. Are there many games that even utilize multiple cores? :confused:
P-Worm
It will be interesting to compare these benchmarks with some on a machine running 10.5, considering how much differenty the new OS is supposed to handle multiple "cores".
gugy
Apr 16, 2007, 01:57 PM
Ok, since when was I talking to you? Was your name in the quoted text above?
hahahaha,
That's funny!
:D
BenRoethig
Apr 16, 2007, 02:02 PM
It will be interesting to compare these benchmarks with some on a machine running 10.5, considering how much differenty the new OS is supposed to handle multiple "cores".
The games won't. Pro Apps should see increases.
SPUY767
Apr 16, 2007, 02:08 PM
By the end of the year, yes, there will be games that utilize multiple cores. For example, Crysis will...
World of warcraft, notably the only game with a mildy appreciable gain in frame rate is already using a multi-core version of OpenGL.
Hattig
Apr 16, 2007, 02:11 PM
The poor scaling probably stems from a combination of factors.
1) Intel's x86 line scales poorly past 4 cores in a system compared to AMD's Opteron processors because Intel still rely on the ancient FSB architecture that is quite restricting(*). 4 cores on a 1333MHz bus gives each core 333MHz of bus bandwidth, aside from cache coherency traffic. That sucks for a 3GHz core - the original P4 had 400MHz and that was limiting past 2GHz even with no coherency traffic.
Intel's CPUs are a better design (currently) than AMD's simply because they're quite a lot newer, so are a good choice for 1 and 2 core machines, and 4 core machines are still good. I don't need to mention that AMD will bring the situation back on par with their next core due out soon, and that Intel will remove the FSB in some future products.
2) FB-DIMMs are also an issue. Yet Another Intel Memory Messup.
3) Mac OS X might also have issues scaling nicely to 8 cores. It's not so hot with massive threading either (as per AnandTech's tests) but it shouldn't be an issue at this level.
4) Of course the application support for 8 cores is the major issue, but can't fix the above issues once done.
Apple are only a factor in one of the above list, given that they're using Intel for the processors exclusively. There's probably not a lot they can do here though.
(*) you can use specialist chipsets instead of Intel's standard chipsets to improve the situation a lot however. Of course these would increase the cost vastly, and probably not be suited for a workstation.
smirkingboy
Apr 16, 2007, 02:12 PM
It's been said on here 100 times (and I too have been guilty of the hoopla) but we need to stop obsessing over processors. The truth of the matter is that 99% of us never even come close to 100% usage. I use "pro" apps all day long and it just never happens.
You'll be MUCH better served to load up on extra RAM than to spend the money on extra processors. My Dual G5 is equal to my Intel Quad in virtually every test I've done (except for video encoding, which does use multi-core processors effectively). Moving a Quad Xeon from 2GB of RAM to 4GB will give you a much more tangible everyday speed increase than going from Quad Zeon to 8-core.
The sole exception to this is major encoding / transcoding of video, so unless you're doing that save yourself the money and resist the urge to pony up for more processor power that you'll never touch. RAM and bus speed are WAY more important.
Sorry, lecture over.
Obviously you never run 3d programs... run a Global Illumination and 4x Anti-Aliasing render in cinema 4d and you'll change your tune.
Also, perhaps you never work with horrid deadlines? Try compressing mpeg2, rendering in after effects, and manipulating massive photoshop files, all at once - yes, you need lots of RAM for this, but twice the processors would/should help massively as well, as long as the software, the rest of the hardware, and the OS can utilize them.
tibbon
Apr 16, 2007, 02:34 PM
People are saying that no one uses all of their cores/processors, well I do! Maybe it won't happen today, but soon it will happen. Invariably EVERY computer that I've ever touched that was new I thought, "I'll never be able to max it out", and then shortly after... I do.
I remember lucidly having a conversation with a friend over the Alpha x86 500mhz processors years ago. This was when most people were using 486's on average, and a Pentium 90 was considered top of the line. We both had a long history in computers, and he is a very smart guy. I thought, "Wow, that's so cool that they wer getting that fast!", and his thoughts were, "WTF would you use that for? Windows and *nix flies on a 90mhz processor! The average person will NEVER need that".
Well, each of us now carry around Treos which I think have more processing power than our desktops back then, and we have long passed maxing out 500mhz processors.
What I do like about these systems is that (with limitations) they come closer to "longer lasting" computers that are more future proof. Nothing is future proof mind you, but my G4 MDD on my desk (which almost ALWAYS has it's processors pegged when using Logic Pro or Aperture) is over 4 years old now. I'm guessing that the 8-core systems will last about 5 years perhaps in some environments. Not bad IMHO for a computer.
I think that in the future we will be able to take more and more advantage of multiple cores, and the systems will seem faster as we use the cores more effeciently. Kinda like how they were able to make prettier/more complex games for the PS2 as the years went on. Same hardware, better programming.
I'm going to purchase a 4-core machine soon, and just load it up with memory however, and I think the economics at the moment do show it to be a better idea for what I do. If I were doing research, 3d rendering, etc... I would go for the 8-core.
People need to stop assuming that the technically fastest product on the market will the be best for their needs, just because they consider their needs "pro" or "heavy use". A Ford Mustang with a 1500hp blower on it, doesn't make it the best car for every use. A 400hp BMW might smoke it in many/some uses.
Get the product you NEED, not the product that you want to need.
guzhogi
Apr 16, 2007, 02:37 PM
The poor scaling probably stems from a combination of factors.
1) Intel's x86 line scales poorly past 4 cores in a system compared to AMD's Opteron processors because Intel still rely on the ancient FSB architecture that is quite restricting(*). 4 cores on a 1333MHz bus gives each core 333MHz of bus bandwidth, aside from cache coherency traffic. That sucks for a 3GHz core - the original P4 had 400MHz and that was limiting past 2GHz even with no coherency traffic.
Intel's CPUs are a better design (currently) than AMD's simply because they're quite a lot newer, so are a good choice for 1 and 2 core machines, and 4 core machines are still good. I don't need to mention that AMD will bring the situation back on par with their next core due out soon, and that Intel will remove the FSB in some future products.
(*) you can use specialist chipsets instead of Intel's standard chipsets to improve the situation a lot however. Of course these would increase the cost vastly, and probably not be suited for a workstation.
What does AMD use instead of a FSB? I don't follow AMD that much.
The problem that I see (even though I agree with you completely) is that technology (and business) are developing at so fast a pace that the man-hours spent creating a completely new OS would only create a vaporware spiral.
A tortoise and a hare are having a race. The tortoise gets a 100-yard headstart, and then the hare takes off. However, in the time that the hare takes to cross those 100 yards, the turtle has gone another 25 yards. Then the hare has 25 yards more to run before he can catch up with the turtle -- and when he gets to the 125-yard mark, the turtle has gone another 6 yards or so... and so on.
It'd be like that, except the hare would get the head start and the turtle couldn't even dream of catching up :)
We got a lot of the fundamentals of software done while technology was advancing fairly slowly. At this point, it's just basically impossible.
I know. Just curious to see how much a bottleneck the OS and other software is, you know?
Something else I'd like to see is how fast OS 7 would be if they made it run on a Mac Pro. I kinda miss the days of OS 7. The games & simplicity of the OS itself. I know, not many features, but I liked how the only thing you needed to boot up the computer was the Finder and System suitcase in the same folder at the root of your hard drive. W/ Mac OS X, you need 1000s of files all over the place. Oh well.
SMM
Apr 16, 2007, 03:23 PM
It makes me wonder how serious the issue is and what, if anything, Apple has come up to address the slowness issue in my previous post...
I see the delay in Leopard (which may have been discussed elsewhere) as a serious blow to Apple's credibility, especially if the speed fix is wrapped into Leopard.
I think your post lacks credibility.
Multimedia
Apr 16, 2007, 03:24 PM
I suppose it may contribute a significant part to Leopard delay - optimizing it to Quad - core processorsPerhaps. But one of the biggest missing elements for optimum 8 core performance among multi-threaded workflows is a 2007 Stoakley-Seaburg (SS) (http://techreport.com/etc/2006q4/clovertown/index.x?pg=1) motherboard. Without those memory and core management chips, a lot of efficiencies are being wasted. The new Compressor 3 in FCS2 does use all 8 cores very efficiently which is great. But I still can't see buying the 8 core MP until it has the SS motherboard so at least when Leopard finally ships in SIX (6) more months :eek: I'll know it can do the best possible job that 8 cores can do for a group of applications running together - not just one. At this point I think I may even decide to wait for Penryn on that SS motherboard. So I've got a lot of patience.
I know some here like to accuse me of always telling people to wait for the next big thing - not really true. I think if I was a big money making FCS shop that can use all the improvements in FCS2 right now, buying at least one 8 core MP makes sense. The issue of waiting or not for the Penryn SS 8 core is really individualistic according to your circumstances. One fellow here was in a situation where he was being offered one by his employer for all but free. I'm in a simular situation but i don't want to squander the opportunity without getting the model I've had in my mind's eye since almost a year ago or at least since November when I first read that SS article (http://techreport.com/etc/2006q4/clovertown/index.x?pg=1).
milo
Apr 16, 2007, 03:34 PM
Perhaps. But one of the biggest missing elements for optimum 8 core performance among multi-threaded workflows is a 2007 Stoakley-Seaburg (SS) (http://techreport.com/etc/2006q4/clovertown/index.x?pg=1) motherboard. Without those memory and core management chips, a lot of efficiencies are being wasted. The new Compressor 3 in FCS2 does use all 8 cores very efficiently which is great. But I still can't see buying the 8 core MP until it has the SS motherboard so at least when Leopard finally ships in SIX (6) more months :eek: I'll know it can do the best possible job that 8 cores can do for a group of applications running together - not just one. At this point I think I may even decide to wait for Penryn on that SS motherboard. So I've got a lot of patience.
I know some here like to accuse me of always telling people to wait for the next big thing - not really true. I think if I was a big money making FCS shop that can use all the improvements in FCS2 right now, buying at least one 8 core MP makes sense. The issue of waiting or not for the Penryn SS 8 core is really individualistic according to your circumstances. One fellow here was in a situation where he was being offered one by his employer for all but free. I'm in a simular situation but i don't want to squander the opportunity without getting the model I've had in my mind's eye since almost a year ago or at least since November when I first read that SS article (http://techreport.com/etc/2006q4/clovertown/index.x?pg=1).
Normally I wouldn't wait for hardware, but I think the Leopard delay is a good excuse to wait on buying a new mac (unless you absolutely need one right now).
Hattig
Apr 16, 2007, 03:47 PM
What does AMD use instead of a FSB? I don't follow AMD that much.
AMD use Coherent Hypertransport (marketing name: Direct Connect) between the CPUs in their systems, and an on-CPU memory controller (dual-channel DDR2). Direct Connect is 8GBps per link currently, going up to 20GBps later this year. Registered DDR2 is available at 667MHz, maybe 800MHz now, so that's 10-12GB/s per CPU (theoretically).
That means in a 4 CPU (8 core) system you have 8 DDR2 memory controllers, which provides a boat-load of bandwidth. With Barcelona in a couple of months you'll have a 2 CPU (8 core) system with 4 DDR2 memory controllers, so not as good, but more compact (and better than accessing memory over the FSB).
Assuming the latter, and ignoring coherency traffic for an 8 core system:
AMD Barcelona: 21 - 25 GBps memory = 2.6 - 3.2 GBps per core.
Intel: 2x1333MHz FSB = 20 GBps memory = 2.6 GBps per core.
(note however that coherency traffic will be *very* significant on an 8-core system)
Intel make up for it with good prefetchers in their cores, but these are less effective with more cores, and a high amount of cache. Barcelona has better prefetchers however, and larger cache, so it's probably a wash in the end. So AMD's greater memory bandwidth per core is what seals the deal.
jhtrico1850
Apr 16, 2007, 03:48 PM
It makes sense to me that the gaming results are no different. Are there many games that even utilize multiple cores? :confused:
P-Worm
Check this link (http://enthusiast.hardocp.com/article.html?art=MTMwNiwsLGhlbnRodXNpYXN0) out.
Hattig
Apr 16, 2007, 03:48 PM
Two FSB at 1333 MHz, and two dual-channel 1333MHz memory busses.
That's for 8 cores, 2 CPUs with 4 cores each, each CPU having its own bus for its 4 cores, which are on two dies. My point is entirely accurate.
centauratlas
Apr 16, 2007, 04:25 PM
I'd like to see some mysql benchmarks. In my experience mysql uses threads pretty well, so I'm hoping it is a big improvement.
matticus008
Apr 16, 2007, 04:25 PM
It shouldn't, but Apple in its infinite wisdom doesn't really give you many choices.
There's always the choice NOT to buy the 8-core Mac Pro. If you want to game, you're better suited with any of the cheaper Mac Pros, which would offer greater relative performance per dollar. Of course, your game selection per dollar will also be far better with a Windows PC.
Bottom line is that you can't have it all, and Apple is just not that terribly interested in tackling the hardcore gamers or the corporate market. They've done quite a bit to assist Macintosh gamers to provide more and better titles, but they've never tried to take on DirectX game developers.
AidenShaw
Apr 16, 2007, 04:40 PM
That's for 8 cores, 2 CPUs with 4 cores each, each CPU having its own bus for its 4 cores, which are on two dies. My point is entirely accurate.
You're right - I read your post too quickly - I deleted my response.
How do you quantify AMD's problems with a NUMA architecture, though?
Because the AMD memory is connected to each socket, CPUs in that socket access the local memory through the memory controller. If the memory happens to be connected to the other socket, the serial HT bus is used to access the remote memory controller - adding latency.
On a 4 socket system, the sockets are connected in a ring. If the memory happens to be on the "far" socket, the memory access has to go across two HT links in series, adding even more latency.
AMD's architecture has nice numbers when you add them all up, but when you actually benchmark the 4 socket machines you see a pretty significant variability in runtimes due to the extra latencies.
Amdahl
Apr 16, 2007, 04:45 PM
(note however that coherency traffic will be *very* significant on an 8-core system)
Intel make up for it with good prefetchers in their cores, but these are less effective with more cores, and a high amount of cache. Barcelona has better prefetchers however, and larger cache, so it's probably a wash in the end. So AMD's greater memory bandwidth per core is what seals the deal.
I think the coherency traffic is what is killing the 8-core. Since 10.4 doesn't prevent core swapping, you've got a tremendous waste of scarce memory bandwidth on that problem. My theory is that running only 7 threads will eliminate most of that core swapping, or raising priority on your work threads will also cut the number of core swaps.
There is also the possibility that many programmers tested their code on machines that don't have as big a penalty for False Sharing. (False Sharing causing repeated cache coherency traffic between two threads accessing different, but nearby, data.) Some of the slowdown on 8-core could be eliminated by optimizations in the programs.
SPUY767
Apr 16, 2007, 04:49 PM
your game selection per dollar will also be far better with a Windows PC.
I assume that you're just ignoring boot camp as an option. I run boot camp marvellously on a second hard drive, with a seperate GPU, I could run it with the standard 7300, but I found it fairly easy to install an 8800 instead.
Amdahl
Apr 16, 2007, 04:51 PM
How do you quantify AMD's problems with a NUMA architecture, though?
It is an extension of affinity. Once the OS knows about affinity (including memory affinity), the problem is not so bad. The OS knows a thread is happiest on a certain core, and it knows memory allocations from that core should come out of the memory range directly connected to that core. Problem solved.
In the cases where the thread then runs on a different core... tough beans.
XP has this in a glitchy way, Vista has it fully implemented.
NukeLaLoosh
Apr 16, 2007, 05:16 PM
Perhaps. But one of the biggest missing elements for optimum 8 core performance among multi-threaded workflows is a 2007 Stoakley-Seaburg (SS) (http://techreport.com/etc/2006q4/clovertown/index.x?pg=1) motherboard.
Speaking of which, Intel is demoing SS motherboards at this week's IDF (http://www.theinquirer.net/default.aspx?article=38961) in China, with at least some of them running 3.2 GHz quad-core Penryns.
iMikeT
Apr 16, 2007, 05:16 PM
All this power in the 8-core Mac Pro and not that much software to take advantage of it...:rolleyes:
milo
Apr 16, 2007, 05:20 PM
All this power in the 8-core Mac Pro and not that much software to take advantage of it...:rolleyes:
It's always a chicken/egg problem. You can't really expect software companies to release versions optimized for eight cores before the hardware is released.
At least Apple is going to have at least some apps that take advantage when FCS2 ships next month. Hopefully Adobe will update soon (if not have 8 core in time for release, it's not final yet, is it?).
Hattig
Apr 16, 2007, 05:23 PM
You're right - I read your post too quickly - I deleted my response.
No problems, I do the same too often :)
How do you quantify AMD's problems with a NUMA architecture, though?
Because the AMD memory is connected to each socket, CPUs in that socket access the local memory through the memory controller. If the memory happens to be connected to the other socket, the serial HT bus is used to access the remote memory controller - adding latency.
On a 4 socket system, the sockets are connected in a ring. If the memory happens to be on the "far" socket, the memory access has to go across two HT links in series, adding even more latency.
AMD's architecture has nice numbers when you add them all up, but when you actually benchmark the 4 socket machines you see a pretty significant variability in runtimes due to the extra latencies.
This is very true, although the OS can help significantly when it knows it is running on a NUMA architecture machine. Even without you have an average hop number of around about 1, which is the same as the CPU <-> Chipset on the Intel equation.
jpsalvesen
Apr 16, 2007, 05:44 PM
The 8-core is not as fast as expected/hoped because of three things combined:
1. Tiger is suboptimal when it comes to parallel execution.
2. The memory bus is a slowing factor.
3. The software isn't parallelized enough to benefit from 8 cores.
Agreed?
Amdahl
Apr 16, 2007, 05:53 PM
The 8-core is not as fast as expected/hoped because of three things combined:
1. Tiger is suboptimal when it comes to parallel execution.
2. The memory bus is a slowing factor.
3. The software isn't parallelized enough to benefit from 8 cores.
Agreed?
#3 is only true in some cases, like games. Even when the software is parallelized for 8-core, it is still running slow because of #1 & #2. It depends on the data access & computation patterns.
matticus008
Apr 16, 2007, 05:56 PM
I assume that you're just ignoring boot camp as an option. I run boot camp marvellously on a second hard drive, with a seperate GPU, I could run it with the standard 7300, but I found it fairly easy to install an 8800 instead.
By "Windows PC" I mean a Core 2 Duo/Core 2 Extreme desktop system--I should have specified. You don't see many gamers building Xeon rigs, for a very good reason. You can get superior gaming performance for a cheaper price by skipping the FB-DIMMs and the workstation motherboards.
Certainly, boot camp would be an option if you wanted to game with a Mac Pro, but you still wouldn't want to spend the premium for an 8-core version because you'd get almost no return on that extra investment (for gaming). The much cheaper quad core would offer almost identical performance, and leave room in the budget for a killer graphics card.
AidenShaw
Apr 16, 2007, 07:16 PM
http://pda.tweakers.net/?reviews/661
A wide range of benchmarks comparing Woodcrest and Clovertown - SPEC, MySQL, Postgres,...
The relatively poor showing of the octo on SPECfp_rate does point to memory saturation (SPECfp is as much a memory bandwidth test as a computation test).
SeaFox
Apr 16, 2007, 07:33 PM
Barefeats speculates that the 8-Core Mac Pro maybe bottlenecked by the memory bus and also considers the possibility that Mac OS X Tiger may not be well optimized for the 8-Core Mac Pros.
Get ready for the "I'm waiting for the REAL 8-core MacPro" posts...
pilotError
Apr 16, 2007, 07:39 PM
Has anyone tried the Leopard Beta's to see if theres been any improvement?
SeaFox
Apr 16, 2007, 07:46 PM
It's always a chicken/egg problem. You can't really expect software companies to release versions optimized for eight cores before the hardware is released.
Huh? There's nothing to say the hardware has to be in existence before the software is optimized for it. There's plenty of mathematics/genetics software made to scale up to 64 or 128 processors.
Also, remember that the original Cloverton upgrade was performed months ago. So it's not like it would be impossible to have built an 8-core Mac Pro before so you have something to test on.
AidenShaw
Apr 16, 2007, 07:47 PM
Get ready for the "I'm waiting for the REAL 8-core MacPro" posts...
2007 Stoakley-Seaburg Motherboard With New Graphics Cards 8-Core Mac Pro (http://forums.macrumors.com/showthread.php?t=293910
)
( http://forums.macrumors.com/showthread.php?t=293910 )
Already has 146 replies...
840quadra
Apr 16, 2007, 07:48 PM
Get ready for the "I'm waiting for the REAL 8-core MacPro" posts...
In a way, they have started.
1. Some people want a single chip with 8 cores (nobody has posted that yet here)
2. some people want a system that has a total of 8 cores (two chips with 4 cores), that doesn't have such bottlenecks.
This 8 core system reminds me of the early Macintosh G4 Duals. Yes you had dual processors, but the system board wasn't suited (bandwidth of everything) to get data in and out really fast. Hence why those early G4s were nothing near twice as fast as a single processor system at the same clock.
2007 Stoakley-Seaburg Motherboard With New Graphics Cards 8-Core Mac Pro (http://forums.macrumors.com/showthread.php?t=293910
)
( http://forums.macrumors.com/showthread.php?t=293910 )
Already has 146 replies...
Reminds me of the pre Merom days here in Macland. I wonder how many people who are complaining about this 8 core system and want something better, are even able to buy / afford the product in the first place.
I kept a running tally of how many Merom "chronic" complainers there was, and how many actually got systems at or near launch time. Lets just say that the percentage of those who actually got one(or pretended to) was well below 60%!
commander.data
Apr 16, 2007, 08:13 PM
That's for 8 cores, 2 CPUs with 4 cores each, each CPU having its own bus for its 4 cores, which are on two dies. My point is entirely accurate.
I don't think there is actually a major problem problem with 1333MHz FSBs for quad cores. Besides, it's simply too simplistic to do some division and say there is only 333MHz of equivalent bandwidth per core. For one thing, the unified L2 cache means it's more like 667MHz per L2 cache. In terms of cache coherency, the snoop filter does help with that. And you can't just say that Netburst started with a 400MHz FSB. Merom is a fundamentally different architecture and seems much more FSB bandwidth agnostic.
You can check benchmarks between the 1.8GHz E4300 with a 800MHz FSB and the 1.86GHz E6300 with a 1067MHz FSB and the results are nearly identical with variations more likely do the slight clock speed advantage of the E6300 than the FSB difference. What's more these parts only have 2MB of L2 cache and so larger caches would buffer the difference even more. A comparison between mobile Core 2 Duos and desktop Core 2 Duos actually shows them performing similarly at similar clock speeds and cache sizes dispite Merom only having a 667MHz FSB and Conroe having a 1067MHz one.
Yes, in the most bandwidth intensive applications a 1333MHz FSB will likely not be enough, but the bottleneck in the Bensley platform is not the FSB but the memory controller. FB-DIMMs in general are very inefficient and the first-gen MCs are even worse. Tests have shown that a quad channel DDR2 533 FB-DIMM setup (4-4-4 timings) only has as much bandwidth as a dual channel DDR2 667 setup (5-5-5 timing) and has much higher latency. There is nothing wrong with the 1333MHz FSBs, because currently they just don't have enough memory bandwidth to fill them so increasing the FSB won't solve anything.
http://www.theinquirer.net/default.aspx?article=38961
Still, Intel is moving to dual 1600MHz FSBs for Xeon Penryn variants, but the major improvement in terms of memory bandwidth is not the faster FSBs, but the redesign 2nd-gen MC in Stoakley. This should bring better bandwidth utilization and hopefully better scaling from dual to quad channels and it looks like Intel is specifically focusing on reducing latency which is a good thing. The Merom architecture seems to be better than Netburst in that it isn't as sensitive to raw bandwidth, but in exchange it is closer to K8 in that it is more sensitive to latency so Stoakley should help in all regards. Stoakley also has a optimized snoop filter for quad cores which is help reduce bandwidth requirements for cache coherency.
SMM
Apr 16, 2007, 08:23 PM
It seems like this is another rushed release from Apple. It had been a while since the Mac Pro saw an update, the 8 core was sort of out and they released it to be ahead of the competition.
Something tells me we'll see a lot of these types of releases by Apple in the near future. The main focus right now is on the electronic gadgets side of things. Unfortunetly, this is where there's a lot of money to be made. Mass consumers electronics. Apple TV, iPhone. The delays for Leopard is one example. They need to create some sense of expectation. Keep the favorable rumors going.
We'll see product updates in minor ways and the excuse software wise will be the pending release of Leopard. The coming months will be critical for Apple and with the looks of things they are focusing where there's money to be made and that's not computers.
And where do you come by this profound wisdom, and ability to see the future? If Apple does not release it, people like you will be screaming rape for them not being responsive to the 'latest and greatest'. When they do release it, they get hammered because it does not live up to someone's unfounded fantasy of what it should be.
The Clovertowns were widely discussed on this site, long before they were ready for production. It was generally agreed the current architecture would not be sufficient to take advantage of the doubling of the processors. Or, more precisely, the advantages would be negligible. So, no one should be surprised when that is what happens.
As for where Apple is focusing their effort, unless you sit in their strategic planning sessions, your credibility is no better than your attitude.
ChrisA
Apr 16, 2007, 09:19 PM
As for where Apple is focusing their effort, unless you sit in their strategic planning sessions, your credibility is no better than your attitude.
We don't need to sit in their strategic planning sessions. We already know
They removed "computers" from their corporate name
The key note speech at Mac World did not even talk about Macs
They took key people off the Leopard project so they could work on a phone
Taken together we can see what's going on.
guzhogi
Apr 16, 2007, 09:31 PM
AMD use Coherent Hypertransport (marketing name: Direct Connect) between the CPUs in their systems, and an on-CPU memory controller (dual-channel DDR2). Direct Connect is 8GBps per link currently, going up to 20GBps later this year. Registered DDR2 is available at 667MHz, maybe 800MHz now, so that's 10-12GB/s per CPU (theoretically).
That means in a 4 CPU (8 core) system you have 8 DDR2 memory controllers, which provides a boat-load of bandwidth. With Barcelona in a couple of months you'll have a 2 CPU (8 core) system with 4 DDR2 memory controllers, so not as good, but more compact (and better than accessing memory over the FSB).
Assuming the latter, and ignoring coherency traffic for an 8 core system:
AMD Barcelona: 21 - 25 GBps memory = 2.6 - 3.2 GBps per core.
Intel: 2x1333MHz FSB = 20 GBps memory = 2.6 GBps per core.
(note however that coherency traffic will be *very* significant on an 8-core system)
Intel make up for it with good prefetchers in their cores, but these are less effective with more cores, and a high amount of cache. Barcelona has better prefetchers however, and larger cache, so it's probably a wash in the end. So AMD's greater memory bandwidth per core is what seals the deal.
Thanks for the info. With computer components changing so quickly, it's hard to keep up. Only question is since each CPU has it's own memory controller, how do they keep from trying to access the same address in memory? Does each CPU get its own bank of RAM? If that's the case, then sharing info between the CPUs would be hard, but not impossible, I guess if you had a good OS.
commander.data
Apr 16, 2007, 10:05 PM
Thanks for the info. With computer components changing so quickly, it's hard to keep up. Only question is since each CPU has it's own memory controller, how do they keep from trying to access the same address in memory? Does each CPU get its own bank of RAM? If that's the case, then sharing info between the CPUs would be hard, but not impossible, I guess if you had a good OS.
Yes in AMD's server architecture each chip has it's own IMC and it's own bank of RAM. Cache coherency is done through dedicated HT links between the processors. With 3 HT links per chip it scales well to dual chips and pretty well to 4 chips. Barecelona will add a 4th HT link to allow "ideal" scaling to 4 chips and there will also be a link spliting ability so each chip can have 8 half links to allow for 8 chip setups. So far performance is great if each processor is working off it's own RAM bank, but of course things get difficult when 1 chip is needs to access data from another chips banks despite the HT links. I think XP doesn't have any NUMA support at all and Vista was supposed to add it, but so far the implmentation seems poor since there isn't much performance increase.
AidenShaw
Apr 16, 2007, 10:15 PM
I think XP doesn't have any NUMA support at all and Vista was supposed to add it
XP x64, Server 2003 and Vista have NUMA support.
It's a hard problem to solve, however. Windows takes the tactic that when a thread allocates memory (at the system level, not at the "new" or "malloc" call in the application code), the memory will be allocated from the NUMA node where the thread is running if there is available memory on that node.
That's great if the thread continues to be scheduled on a CPU on that node, but if all the CPUs on that node are busy - what do you do? Typically, you'll decide that it's better to run on a "far" node with slower memory access than to not run at all.
Windows APIs have the functions to allow a program to control where memory is allocated, and to say that a thread would "prefer" to run where the memory is.
but so far the implmentation seems poor since there isn't much performance increase.
Do you have some links to support that conjecture?
AidenShaw
Apr 16, 2007, 10:24 PM
Only question is since each CPU has it's own memory controller, how do they keep from trying to access the same address in memory? Does each CPU get its own bank of RAM?
Each NUMA memory node has a unique base physical address for the RAM attached to that node.
For example, if you had a two socket system with 2 GiB of RAM on each memory controller
o Socket 1: RAM addresses 0x00000000 to 0x7fffffff (0 to 2GiB-1)
o Socket 2: RAM addresses 0x80000000 to 0xffffffff (2GiB to 4GiB-1)
It's not difficult at all. In essence, it's no different from having two DIMMs in a system - they don't have the "same" address, the system assigns each DIMM a unique offset.
SMM
Apr 16, 2007, 10:28 PM
We don't need to sit in their strategic planning sessions. We already know
They removed "computers" from their corporate name
The key note speech at Mac World did not even talk about Macs
They took key people off the Leopard project so they could work on a phone
Taken together we can see what's going on.
You are drawing conclusions from insufficient data. You are also weighting it questionably. You are also taking a small snapshot of time and attempting to use that as forecast point of reference. There is nothing even remotely conclusive in this.
Removing 'computer' from their name means nothing. It is marketing. Think about it. They now have iPhone, iPod and iTV. Those are not normally associated with computers. Just because they are diversifying their products, does not mean they are dropping other ones. Why go through the cost and headaches of moving to the Intel processor, if they were going to move out of computers?
I expect Apple will be doing a major overhaul of their Mini's and iMacs soon. There is another thread going where Apple is getting hammered for just doing an incremental upgrade to the Mac Pro's. If they were to do a minor upgrade to the other machines, they would get hammered for that. To do something exceptional, they have to wait for some advances from other vendors. The bottomline is, far too many people are acting like spoiled kids. No matter what they do, someone whines.
Now, there are professional whiners, of the MS disinformation/propaganda ilk. This board is just crawling with them. Those can be discounted. They have the collective IQ of an ameba and no sense of honor. However, many people here set themselves up for disappointment. They create fantasies of what they want, then cry when the reality falls short.
commander.data
Apr 16, 2007, 11:27 PM
Do you have some links to support that conjecture?
http://www.techreport.com/reviews/2007q1/quad-core/index.x?pg=13
It's kind of an observation. When Quad FX was released it performed poorly against Kentsfield despite speculation that the dual die approach was starved for bandwidth and would be crippled by cache coherency. It was generally thought that Quad FX was hobbled by the NUMA implementation in XP and would gain on Kentsfield once Vista was released. Recent testing has shown that Vista doesn't seem to allow Quad FX to gain ground on Kentsfield so it's implied that either Quad FX is already going all out or Vista's NUMA implementation still doesn't expose all of Quad FX's potential.
Amdahl
Apr 16, 2007, 11:54 PM
Windows APIs have the functions to allow a program to control where memory is allocated, and to say that a thread would "prefer" to run where the memory is.
I think the trick is to set your thread/core affinity before you start allocating memory. The challenge is to set the affinity right when you don't know what other apps are doing. Or can you? It would be easier if the OS did it.
AidenShaw
Apr 17, 2007, 12:09 AM
http://www.techreport.com/reviews/2007q1/quad-core/index.x?pg=13
Unfortunately, this story criticizes XP/Vista, without offering an example of an operating system that *can* do well with NUMA.
NUMA is very hard - "Non Uniform Memory Architecture" - some RAM is fast, some RAM is much slower. "Fast" and "Slow" are relative to the CPU on which a thread is scheduled.
An optimum memory layout goes to hell if you're scheduled on a different processor.
One can write an application that hardwires itself to the NUMA topology - and get the best possible performance. This is great for a scientific cluster app where you can give one app complete control of the machine - but for a general purpose multiprocessing desktop it's not so simple. In fact, "cluster" is very appropriate - the program needs to treat the NUMA system like a collection of boxes, and control how the boxes are scheduled and allocated.
One can look at this data and come up with either the conclusion that "Windows sucks" or "NUMA sucks". Without giving an example of a system that performs much better than Windows, concluding that "Windows sucks" shows a bit of bias.
It's all academic to MacRumors, though, since Apple won't let OSX run on any AMD system.
AidenShaw
Apr 17, 2007, 12:21 AM
I think the trick is to set your thread/core affinity before you start allocating memory. The challenge is to set the affinity right when you don't know what other apps are doing. Or can you? It would be easier if the OS did it.
If *you* don't know what the other apps are doing, how will the OS determine it?
How will the OS determine that you'll be launching Photoshop 10 minutes after starting a video render job? Any attempt to set affinity is likely to hurt performance if the load changes.
If you want a shocking look at the complexity involved in this, take a peek at the VMware ESX documentation. VMware's NUMA support includes the automatic migration of memory to the NUMA node where the VM is running - it's better to copy memory between NUMA memory controllers than to access "far" memory!
(see http://www.vmware.com/pdf/esx2_NUMA.pdf)
commander.data
Apr 17, 2007, 01:32 AM
Unfortunately, this story criticizes XP/Vista, without offering an example of an operating system that *can* do well with NUMA.
I guess it would be hard to do a legimate comparison of NUMA between OSs since there would be so many other differences. Anyways, just looking at SPECfp_rate2000 results though, which are supposed to be bandwidth intensive I believe, systems with SuSE Linus tend to do better than better than those on Windows Server 2003 Enterprise even though they are identically configured.
http://www.spec.org/osg/cpu2000/results/res2006q4/cpu2000-20060918-07372.html
http://www.spec.org/osg/cpu2000/results/res2006q4/cpu2000-20060918-07373.html
And I believe Solaris achieves the highest scores for Opterons in SPECfp_rate2000. I guess the question is how applicable is SPECfp_rate2000 in testing NUMA across various OSs.
MrTed
Apr 17, 2007, 02:10 AM
Any Logic pro benchmarks anywhere?
Logic developpers said there won't be any logic 7 update using 8 core.
I didn't see benchmarks but they also said quad macpro do a little better than octo macpro on logic ...:(
rolandf
Apr 17, 2007, 11:22 AM
Admittedly, the new Apple laptops are convenient to work with, although most of the time I'm still using my 5y old PB G4 that still fits me for many things.
However, since the switch to Intel, design hasn't improved, and the ProBooks just got uglier; all in all Apple got content to adopt these PC'ish improvements that Intel delivers.
The times are over, when a company like IBM, just pulled out a real performance leap for Apple.
Look at the Playstation3. It took some time to get it running, but people are now gradually taking advantage of the multi-core architecture, and whoever saw it in action, must admit, this was a jump forward.
The adaptation of the Cell would have forced Apple to build up software-know-how concerning efficiently programming multi-core platforms and but also to set in place the appropriate hardware.
I you want to stay atop, you have to go with the best, with those who so-far year by year are pushing forward the limits in many fields of Computer Science.
The Intel move is just convenient in the short run, which makes you lazy and content. That's it.
AidenShaw
Apr 17, 2007, 11:58 AM
I guess it would be hard to do a legimate comparison of NUMA between OSs since there would be so many other differences. Anyways, just looking at SPECfp_rate2000 results though, which are supposed to be bandwidth intensive I believe, systems with SuSE Linus tend to do better than better than those on Windows Server 2003 Enterprise even though they are identically configured.
http://www.spec.org/osg/cpu2000/results/res2006q4/cpu2000-20060918-07372.html
http://www.spec.org/osg/cpu2000/results/res2006q4/cpu2000-20060918-07373.html
And I believe Solaris achieves the highest scores for Opterons in SPECfp_rate2000. I guess the question is how applicable is SPECfp_rate2000 in testing NUMA across various OSs.
Not quite identical - it's 32-bit Windows 2003 and 64-bit Suse !! :eek: (But since the compiler uses 128-bit arithmetic most of the time, this is not as major as it would sound.)
Also note the following from the reports:
Win2k3:
Other Configuration Notes
The start /b /wait /affinity command is used to bind CPU(s) to processes.
Suse:
Other Configuration Notes
Taskset utility used to bind process to CPU(s)
So, Linux had the advantage of running in 64-bit mode, and both systems were running with hard affinity to eliminate hypertransport memory references.
It should give AMD fans pause when they realize that the people running these benchmarks did everything possible to *prevent* any memory traffic from going over the HyperTransport links.... To get the best performance, they had to convert the NUMA SMP system into the equivalent of 8 single CPU systems, bypassing HT and the NUMA topology.
Note that SPECrate is not a multi-threaded application - multiple independent copies of the application are run. In this case, the 8 copies were locked to the individual CPUs of the system, not being scheduled by the OS.
Amdahl
Apr 17, 2007, 11:58 AM
If *you* don't know what the other apps are doing, how will the OS determine it?
I didn't mean it in that sense; I meant can you get other processes' affinity settings from the API? If so, you might steer clear of processors that have been adopted by another app. If you have the luxury.
The OS would do it by being given some relationship info about the threads by each app. The scheduler would then factor that data in with the realtime info that it has in deciding what threads to run and where.
AidenShaw
Apr 17, 2007, 12:22 PM
I didn't mean it in that sense; I meant can you get other processes' affinity settings from the API? If so, you might steer clear of processors that have been adopted by another app. If you have the luxury.
There's a second issue - what do you do when the memory that you need is on a CPU that's been "adopted" by another app?
And note that it's "thread affinity" that's important for multi-threaded applications - all the threads are within one process. (That's true for Windows and Linux 2.6, can anyone confirm for OSX?)
And thread affinity means a shared memory space, so if you schedule threads of a process you might find some threads are close to the memory, and some threads are "far".
NUMA is a very difficult issue to get right in the general case. It's easy if you can lock independent processes to cores, but very hard for a dynamic mix of multi-threaded applications.
milo
Apr 17, 2007, 12:24 PM
Huh? There's nothing to say the hardware has to be in existence before the software is optimized for it. There's plenty of mathematics/genetics software made to scale up to 64 or 128 processors.
Also, remember that the original Cloverton upgrade was performed months ago. So it's not like it would be impossible to have built an 8-core Mac Pro before so you have something to test on.
True. But programmers need to be able to test their code, and it's unrealistic to expect them to hack their system to do so. Certainly a programmer could make an app that should be able to use a non-shipping future number of cores, but they may also be hesitant to ship code without being able to test and optimize it on the hardware in question.
Logic developpers said there won't be any logic 7 update using 8 core.
I didn't see benchmarks but they also said quad macpro do a little better than octo macpro on logic ...:(
Where did they say that? Musicmesse? These days it's rare to hear anything from Logic developers, where did this info come from or is it just a rumor?
james777
Apr 17, 2007, 01:14 PM
I find this conversation confusing and full of claims that are not substantiated.
8 core Mac is approximately 195% faster than a 4 Core on some applications.
luxology did a benchmark on their Modo 3d program which uses a true multithreaded multicore rendering engine and it showed a 195% speed increase over there 4 core.
also the reason why you want multiple cores is to run multiple programs decent speeds.
Games and other software may not see as much speed increase because they were designed to work on COMPUTERS WITH FEWER CPUs
memory will always be a bottleneck no matter how far in the future we go. You need a 4 GB of memory to take advantage of so many CPUs so if o use newer memory the price would be outrageous
james777
Apr 17, 2007, 01:50 PM
Huh? There's nothing to say the hardware has to be in existence before the software is optimized for it. There's plenty of mathematics/genetics software made to scale up to 64 or 128 processors.
Also, remember that the original Cloverton upgrade was performed months ago. So it's not like it would be impossible to have built an 8-core Mac Pro before so you have something to test on.
no idea what goes into software development. threads in C++ and Java have to be expressed. So if I set up a multiple thread application I must tell exactly what portions of the class are allowed to use threads. If I write software with too many threads within an application that is going to run on one processor. i am bottlenecking my own program.
I would like to know which software uses 64 or 128 processors is is meant to run on a mainframe?
most programs switch between one thread and multiple thread processing. Therefore a program that uses one thread would not benefitfrom multiple processes and lets you run multiple programs. Most programs switch between one thread and multiple threads therefore you get the is only 45% to 60% faster the more processors to add. When you write multiple threads you also have to keep track of multiple processes which could induce bugs.
The BeOS actually use multiple threads in all their program libraries which made it quite a unique operating system.
Cult Follower
Apr 18, 2007, 07:40 PM
Man, I really want one of these buggers. But they are so expensive, maybe later.
meshuga
Apr 18, 2007, 08:12 PM
While the gaming benchmarks are interesting in themselves, I gotta wonder who would consider this as a serious part of the purchase decision.
Then again, if you ARE buying this for gaming, you'd be a girly-man for getting less than two 30" monitors.
Funny you write this... today i just put an order in for the 8 core MP and 2 30" apple displays... woohooo..... but for music and film production.... are there games that utilize both screens? i may have to ditch my xbox plans for that lol
LazySusan
Apr 19, 2007, 10:49 AM
Probably 'nothing' to do with the debate's here but I've just had my order for one of these beasties, delayed slightly because I'm told they've changed/updated a part.
Redneck1089
Apr 19, 2007, 12:17 PM
Probably 'nothing' to do with the debate's here but I've just had my order for one of these beasties, delayed slightly because I'm told they've changed/updated a part.
Did they happen to mention which part?
LazySusan
Apr 19, 2007, 03:09 PM
No, I didn't get chance to ask, but they did say I would be getting a better machine.
kromekat
Apr 19, 2007, 04:05 PM
That's interesting - I was wondering why mine had slipped back a few days - be nice to know what is being changed though!?
Adam
Multimedia
Apr 19, 2007, 04:58 PM
Probably 'nothing' to do with the debate's here but I've just had my order for one of these beasties, delayed slightly because I'm told they've changed/updated a part.Did they happen to mention which part?No, I didn't get chance to ask, but they did say I would be getting a better machine.That's interesting - I was wondering why mine had slipped back a few days - be nice to know what is being changed though!? Adam.Prolly just the motherboard. ;) :eek: :confused:
Now you've unleashed a new mystery Susan & Adam. These kinds of unannounced updates are one of the reasons some of us are still waiting a little while longer. Anyway can you try and find out what's changing Susan? Inquiring minds need to know. :)
Redneck1089
Apr 19, 2007, 05:12 PM
Prolly just the motherboard. ;) :eek: :confused:
Now you've unleashed a new mystery Susan & Adam. These kinds of unannounced updates are one of the reasons some of us are still waiting a little while longer. Anyway you can try and find out what's changing Susan? Inquiring minds need to know. :)
Yes indeed. I just ordered myself a quad 3 Ghz yesterday... I would like to know as soon as possible if whatever is being changed will be implemented on the quads too so I know whether or not to cancel my order, or something.
jemdigital
Apr 19, 2007, 06:16 PM
Prolly just the motherboard. ;) :eek: :confused:
Now you've unleashed a new mystery Susan & Adam. These kinds of unannounced updates are one of the reasons some of us are still waiting a little while longer. Anyway can you try and find out what's changing Susan? Inquiring minds need to know. :)
Hello all..long time lurker...first time poster. I decided to upgrade my 1st Gen Dual 2 Ghz G5 and placed my order on April 17th with Apple's business unit for the 8 core Mac Pro... after a hold being placed on it by my credit card company...for the unusual activity...lol.. I got it straightened out on Wednesday April 18th.
As an fyi..don't know about delays in shipping... but my info says the following:
MAC PRO,CTO
Z0D8
Custom configuration
| Show
Ships by: Apr 20 - Apr 24
Delivers by: Apr 24 - Apr 30
Curious to know when Susan and Adam placed there orders and when they were notified by Apple of the delay.
kromekat
Apr 19, 2007, 08:49 PM
Curious to know when Susan and Adam placed there orders and when they were notified by Apple of the delay.
Well I actually placed my order on the 5th of April, but through tooing and froing, and various delays over the business lease agreements, I finally received a go ahead a couple of days back. On that day (17th), my 'ships by' date was the 24th. The following day, I checked the order, and it had been cancelled, and then re-entered with a ship date of the 25th! :confused:
I also can't understand why it will take another 2 weeks to get to me after shipping! :]
Ships by: 25 Apr, 2007
Delivers by: 04 May, 2007 - 07 May, 2007
MAC PRO,CTO
Custom configured
Cancelled
Product Qty.
MAC PRO,CTO
Custom configured
Any improvements they fit in before despatch is ok by me! :)
Adam
jemdigital
Apr 19, 2007, 10:58 PM
Well I actually placed my order on the 5th of April, but through tooing and froing, and various delays over the business lease agreements, I finally received a go ahead a couple of days back. On that day (17th), my 'ships by' date was the 24th. The following day, I checked the order, and it had been cancelled, and then re-entered with a ship date of the 25th! :confused:
I also can't understand why it will take another 2 weeks to get to me after shipping! :]
Ships by: 25 Apr, 2007
Delivers by: 04 May, 2007 - 07 May, 2007
MAC PRO,CTO
Custom configured
Cancelled
Product Qty.
MAC PRO,CTO
Custom configured
Any improvements they fit in before despatch is ok by me! :)
Adam
Hmm...interesting... Adam are you in the U.S.? or U.K.? Reason I'm asking is the way the dates are posted... just trying to do a little sleuthing here.
LazySusan
Apr 20, 2007, 10:37 AM
I've been talking to sales support again, different person this time. She hadn't heard of any changes but checked and called me back, she says there has been a change of manufacturing location, hence the delay.
Does that explain why the order for the machine had been cancelled (on the tracking page) and then reinstated.
So looks like the original reason I was given was inaccurate, I suspect they make it up as they go along.
kromekat
Apr 20, 2007, 11:48 AM
Hmm...interesting... Adam are you in the U.S.? or U.K.? Reason I'm asking is the way the dates are posted... just trying to do a little sleuthing here.
UK
kromekat
Apr 20, 2007, 11:49 AM
I've been talking to sales support again, different person this time. She hadn't heard of any changes but checked and called me back, she says there has been a change of manufacturing location, hence the delay.
Does that explain why the order for the machine had been cancelled (on the tracking page) and then reinstated.
So looks like the original reason I was given was inaccurate, I suspect they make it up as they go along.
Ah.. :]
Watermonkey
Apr 23, 2007, 04:37 PM
So where ARE these machines manufactured anyway?
jemdigital
Apr 23, 2007, 06:24 PM
So where ARE these machines manufactured anyway?
Looks like mine was manufactured in China, made a stop in Hong Kong... is presently in Anchorage, Alaska.. and is due to make it to NJ, USA on Wednesday April 25th.
Hooorahhhh!!!!
kromekat
Apr 23, 2007, 07:09 PM
The ones in the Uk and Europe are built in Ireland.
Adam
clintob
Apr 23, 2007, 07:18 PM
Talk for yourself buddy and I mean only for yourself. Sorry, But what about rendering using Maya, Strata etc. and also using pro music sofware like Logic or Cubase. Some plugins (VST, AU, RTAS) uses a lot of processing power.
So don't asume that 8 cores are not needed just because your apps will probably not use them. There are many apps that will take advantage of 8 cores
I didn't assume anything... I said (rather clearly actually) that MOST users will never see any difference from 8 cores. And apparently you also overlooked the fact that I specifically listed high-end video work as the exception here. If you are using Maya et al you are (a) using high-end video products, and (b) are in the vast minority I mentioned.
Remember... it goes read, process, then respond. ;)
Multimedia
Apr 26, 2007, 07:09 AM
I don't think anyone posted the third Barefeats Benchmark report from April 19 (http://www.barefeats.com/octopro3.html). Looks like the multi-threaded workflow is paying off with an 8-Core model. :)
mwswami
Apr 26, 2007, 10:32 AM
I don't think anyone posted the third Barefeats Benchmark report form April 19 (http://www.barefeats.com/octopro3.html). Looks like the multi-threaded workflow is paying off with an 8-Core model. :) Yes, looks nice! But, I just ordered a 2.66 4-core 4GB 1900XT Mac Pro. And, I will probably get a couple of Hitachi 7K1000 1TB drives when they are available. It should keep me happy till end of 2008 when I plan to get a Nehalem based system.
kromekat
May 2, 2007, 06:14 PM
Well, here are mine from Cinebench:
CINEBENCH 9.5
****************************************************
Processor : 8-Core Mac Pro
MHz : 3Ghz
Number of CPUs : 8
Operating System : 10.4.9
Graphics Card : 7300GT x 2
Resolution : 1900x1200
Color Depth : 32bit
****************************************************
Rendering (Single CPU): 487 CB-CPU
Rendering (Multiple CPU): 2304 CB-CPU
Multiprocessor Speedup: 4.73
Shading (CINEMA 4D) : 574 CB-GFX
Shading (OpenGL Software Lighting) : 2189 CB-GFX
Shading (OpenGL Hardware Lighting) : 4431 CB-GFX
OpenGL Speedup: 7.71
****************************************************
4 x as fast as my dual 2.5 G5! ;)
Adam
Evangelion
May 29, 2007, 03:55 AM
This is very true, although the OS can help significantly when it knows it is running on a NUMA architecture machine. Even without you have an average hop number of around about 1, which is the same as the CPU <-> Chipset on the Intel equation.
I know I'm a bit late, but still :). In reality, the average hop number is less than with Intel systems. With Intel, the RAM is connected to the Northbridge, and the Nortbridge is connected to the CPU. So whenever the CPU needs to access the RAM; it goes through the Northbridge. With AMD, the RAM is directly connected to the CPU, giving AMD a lot lower memory-access latency. True, latency goes up if you access RAM that is connected to another CPU in the system. But in the same time, the bandwidth goes up. And is the latency still worse than it is with Intel? I would say that accessing RAM on another CPU is slightly slower than on Intel-systems that access RAM through Northbridge. But on the AMD-system, the bandwidth is higher.
So that's how it is on AMD-systems: As number of processors increases, the bandwidth goes up, while latency creeps up. How is it on Intel-systems? As number of CPU's increase, the per-CPU bandwidth goes down, but the latency remains the same.
vBulletin® v3.8.6, Copyright ©2000-2012, Jelsoft Enterprises Ltd.