View Full Version : (octo-harpertown Vs quad-nehalem) + 10.6 = ???
Hello,
Strange and concise title, but it says it all. Right now, the quad 2.26 nehalem is slightly more powerful than the octo 2.8 harpertown (on most tests, based on what I've read).
But if 10.6 comes in and allows the machine to make much better use of the 8 cores, couldn't that balance shift?
I'm planning on buying a mac pro soon, and I've been a fan of refurbs for a long time now. There's a very nice octo-2.8 hapertown mac pro in there right now, cheaper than the quad-2.26 nehalem.
Can anyone help me figure out grand central? Is there even enough info on GC yet?
Thanks
Spanky Deluxe
May 9, 2009, 02:57 PM
10.6 won't magically make things run faster. Code has to be written from the ground up to use multiple cores. Snow Leopard comes with OpenCL built in so writing parallel code should be easier and in the year's down the line programs should start using extra cores. This will take time though, people seem to think that Snow Leopard is a magic wand that will make single threaded code magically run faster by using multiple cores. If that were true then Apple would destroy the market since its basically the dream target of many programmers / compiler writers.
A lot of single threaded software simply can't be speeded up with multithreading anyway. Generally speaking, stuff that currently use more than one core will never be able to successfully use more than one core. If you're using stuff that can noticeable make use of two or more cores already then that might get some speed boosts.
Look at what you're doing at the moment. What tools do you use, what software do you use? Go through your standard every day tasks and have the activity monitor open to see how much processor time you're using. For most users - even "Pro" users - even 4 cores is currently overkill. Now, if you do specific tasks that *can* use more cores then 8 cores is great - if you have to run parallelised N-body simulations then its great. If you have to run multiple instances of single threaded tasks then its great. If you're just doing stuff like using Photoshop filters and general web browsing etc then you're not going to use those 8 cores be it 10.5 or 10.6.
Honestly, you'll know if you'll be able to use 8 cores. If you think maybe you might then chances are, you don't and you'd be better off with fewer cores and faster speed.
netkas
May 9, 2009, 03:40 PM
dunno what tests u read, maybe they highly depends on memory bandwidth.
8 virtual cores cant beat 8 real cores in terms of cpu performance (for example, skulltrail systems beats single nehalem in cinebench).
2 virtual cores in 1 physical cpu shares executive units, so if one virtual core uses (for example) FPU already, second will have to wait.
DR_K13
May 9, 2009, 03:41 PM
I might get the 8 core for running high end Aerodynamic Engineering software.
MadisonTate
May 10, 2009, 03:17 AM
.
300D
May 10, 2009, 03:42 AM
Hello,
Strange and concise title, but it says it all. Right now, the quad 2.26 nehalem is slightly more powerful than the octo 2.8 harpertown (on most tests, based on what I've read).
There is no quad 2.26 and no quad anything nehalem will outperform last year's octo-2.8.
Tesselator
May 10, 2009, 03:46 AM
Hello,
Strange and concise title, but it says it all. Right now, the quad 2.26 nehalem is slightly more powerful than the octo 2.8 harpertown (on most tests, based on what I've read).
dunno what tests u read, maybe they highly depends on memory bandwidth.
8 virtual cores cant beat 8 real cores in terms of cpu performance (for example, skulltrail systems beats single nehalem in cinebench).
2 virtual cores in 1 physical cpu shares executive units, so if one virtual core uses (for example) FPU already, second will have to wait.
He's probably talking about this graph that I assembled:
http://forums.macrumors.com/showpost.php?p=7270035&postcount=179
And if so, he's talking about 8 physical cores vrs. 16 virtual cores (or 8 physical). So it's 8 vs. 8.
But even with all that I don't believe for a second that the 2.26 octad is faster than the 2.8 octad. I think the one person who submitted the benchmark for the 2.8 probably got the lowest score possible in that test under his particular environment (maybe he had only 2GB RAM and BG tasks running, etc.) and the 2.26 user submission was the best possible making it appear as if the 2.26 beat the 2.8.
What needs to be considered here is that these benchmarks were submitted by users here with wildly different environments not at all well suited for proper comparisons. The graph was meant to give a VERY general impression of how the various machines preformed for the singular task of rendering in a 32-bit rendering engine. This is 32-bit math with a little memory copying flying as fast as the processor will allow.
What it really shows more than anything else is how the individual processors scale between single thread execution and multi-thread execution . Perhaps the most important mark here is the Multicore Speedup percentage.
Of course the 2.8 octad is going to feel faster and be faster at almost everything we do with our machines - over the 2.26 octad I mean. This is indicated somewhat by the green bar within that above linked graph where the 2.8 is shown to very much faster than the 2.26.
About Grand Central - it's proposed to speed up all or most aspects of OS X. It's unclear to me how much if any impact it will have on applications that weren't compiled to take advantage of these functions.
Concorde Rules
May 10, 2009, 05:17 AM
There is no quad 2.26 and no quad anything nehalem will outperform last year's octo-2.8.
The 2.66 quad beats the 2.8 octo in single thread apps and only just looses by about 10-20% depending on what your doing in multi thread apps ;)
300D
May 10, 2009, 05:21 AM
Measuring single thread apps is pointless. Its a dying standard, much like supporting PowerPC code is now and 68K code was 10 years ago.
Gonk42
May 10, 2009, 08:56 AM
Measuring single thread apps is pointless. Its a dying standard, much like supporting PowerPC code is now and 68K code was 10 years ago.
I don't think that is a good analogy. Though there will be more multi-threaded applications, some processes are inherently serial in nature. As the saying goes, though one woman can have a baby in nine months, nine women can't produce a baby in one month. A good book to read is "The Mythical Man Month" by Brooks. (Though this is about the human side of things rather than computer processors.)
Even given the advances predicted in Snow Leopard, it will be a major investment for say Adobe to rewrite Photoshop and they will only do so as and when it makes economic sense. I think single threaded application speed will be important for quite a few years yet.
It is a very different situation to the underlying code changing (as in PowerPC to Intel) such changes have to happen (the developer has little choice) and are also relatively straight forward (as much of the code can simply be recompiled). Single threaded applications won't suddenly stop running on multi core architectures.
Loa
May 10, 2009, 09:42 AM
Hello,
Thx for all the replies... Sorry for mixing up the numbers for the quad 2.66 nehalem with the octo 2.26 nehalem...
I'm still surprised that nobody sees a 2.8 octo harpertown catching up with the 2.66 quad nehalem once 10.6 hits.
Maybe I'd been putting too much hope into grand central...
@madisontate: can you give me more info on how nehalem is crippled pre-10.6? And are the harpertowns so crippled as well?
Thanks
zmttoxics
May 10, 2009, 09:56 AM
Measuring single thread apps is pointless. Its a dying standard, much like supporting PowerPC code is now and 68K code was 10 years ago.
PPC is far from dead. OS X on PPC may be dead, but IBM is still making mainframe and workstation systems with PPC based chips (POWER5/POWER6 series cpus). Not to mention the Xbox 360, PS3, and the Wii are all running PPC variants (of course, also built by IBM). Terrasoft also produces the PowerStation which uses a 970FX based cpu (same cpu as the Apple G5s).
Also, people still use the Motorolla 68000 arch in many different platforms such as phones / pdas. Not 2 years ago I learned ASM on the 68000 in college.
Tesselator
May 10, 2009, 10:00 AM
Measuring single thread apps is pointless. Its a dying standard, much like supporting PowerPC code is now and 68K code was 10 years ago.
No. Most applications will never be multithreaded. Ever. They physically and logically cannot be with present processor architecture. Until processors radically change neither will this.
As just one very simple example of this consider how a computer calculates Pi. It calculates one value and uses that to calculate the next. This result is used to calculate the next and so on. In this very simple example one of the cores would have to time-travel in order for multi-threading to be possible. There are many such procedural algorithms in very many applications making it impossible for them to multi-thread on current processor architecture.
Pretty much what we have today speaking of multithread apps, is all we're gonna get. With very few exceptions what can be multi-threaded already has been.
We as Apple users are up for some big speed increases as more and more apps go 64-bit tho!
MadisonTate
May 10, 2009, 08:24 PM
.
MadisonTate
May 10, 2009, 08:42 PM
.
Tesselator
May 10, 2009, 08:50 PM
I'm afraid you're quite wrong. http://www.xtremesystems.org/forums/showthread.php?t=221773 . There are numerous multi-threading pi calculation programs ...
Du'oh! That was just an example. I'm not talking specifically about calculating pi - which should have been obvious.
300D
May 11, 2009, 12:57 AM
PPC is far from dead. OS X on PPC may be dead, but IBM is still making mainframe and workstation systems with PPC based chips (POWER5/POWER6 series cpus). Not to mention the Xbox 360, PS3, and the Wii are all running PPC variants (of course, also built by IBM). Terrasoft also produces the PowerStation which uses a 970FX based cpu (same cpu as the Apple G5s).
Also, people still use the Motorolla 68000 arch in many different platforms such as phones / pdas. Not 2 years ago I learned ASM on the 68000 in college.
None of them run OSX do they?
No. Most applications will never be multithreaded. Ever.
Clearly you have your mind set on this answer. You will be very easily proven wrong, as you already have.
Tesselator
May 11, 2009, 02:27 AM
Clearly you have your mind set on this answer. You will be very easily proven wrong, as you already have.
It sounds set because this has been stated 100s and 1000s of times by developers already yet there still after 8 years, seems to be this body of users who think that very very soon now all (or even most) of their apps are going to magically take full advantage MT/HT.
Hehe.. it's been going to happen "real soon now" for the past 6 or 7 years. :p
And still the developers say to their alpha and beta teams: Nope, never gonna happen. So you can see it's not really a matter of being set in some opinion or caring about being right or wrong. I'm just sharing what I've learned.
I have a vested interest in being wrong actually. All my machines are multi-core / multi-processor. I want every developer to spend large portions of their budget catering to me by redesigning their application bases in order to squeeze that extra 10% of performance out of them, heck yeah!
300D
May 11, 2009, 02:38 AM
"real soon now" as in "already happened". Try finding programs today that don't take advantage of at least 2 processors.
Tesselator
May 11, 2009, 02:46 AM
"real soon now" as in "already happened". Try finding programs today that don't take advantage of at least 2 processors.
Thank you. That's exactly my point.
zmttoxics
May 11, 2009, 11:12 AM
None of them run OSX do they?
If you read my post I answered that ("OS X on PPC may be dead, ..."). The issue was, you claimed that coding for these platforms is dead, which is wrong - both platforms are alive and kicking, just not in the mac world so much.
-js-
May 11, 2009, 11:19 AM
Tesselator is right on here. For a number of years to come (certainly more than the practical working lifetime of either the 2.8 octad or new nehalem 2.26 octad) the great majority of applications will fail to take much, if any, advantage of multiple cores. A lot of people really do NOT understand what's involved to make that happen. It is far from trivial. It in no way compares to PPC vs. Intel.
Spanky Deluxe
May 11, 2009, 11:26 AM
Tesselator is right on here. For a number of years to come (certainly more than the practical working lifetime of either the 2.8 octad or new nehalem 2.26 octad) the great majority of applications will fail to take much, if any, advantage of multiple cores. A lot of people really do NOT understand what's involved to make that happen. It is far from trivial. It in no way compares to PPC vs. Intel.
Exactly. Tesselator knows what he's talking about. If anyone has any doubts about programming multithreaded applications, go ahead and give it a try. Sure, sometimes it can be as simple as throwing an OpenMP statement or two in but most of the time, it takes a serious amount of work which often doesn't even scale all too well with multiple processors.
What most people don't seem to realise is that multithreading and multiprocessors is only being used because performance of single cores cannot increase with Moore's Law anymore. Silicon manufacturers are rapidly approaching the limits of how small things can be made using the same techniques as for the last 20 odd years. Quantum interference will start to become a problem. The speed of individual processors is also becoming an issue - why do you think processors haven't really got much over 3GHz over the last decade? First they ramped the MHz up, then they started to hit a limit and so managed to ramp up the amount of work a processor could achieve per MHz and now that they're approaching those limits, the only other option is to use more processors.
Abidubi
May 11, 2009, 11:38 AM
"real soon now" as in "already happened". Try finding programs today that don't take advantage of at least 2 processors.
Hahahaha. Ya. Were you born yesterday?
P.S. Even if every single program today did use at least 2 processors... that means it took 10 years. So by that logic in 10 more we'll see everything using 4 cores.
nanofrog
May 11, 2009, 12:12 PM
Even if every single program today did use at least 2 processors... that means it took 10 years. So by that logic in 10 more we'll see everything using 4 cores.
I doubt it would be linear, and would take even longer. :eek: :p
Tesselator
May 11, 2009, 01:14 PM
.
Last edited by MadisonTate : Yesterday at 03:22 PM. Reason: Offensive?
Don't worry about it bro! I didn't think it was offensive. It's all guud. Sometimes we discuss in declarative terms. That's just human and part of English - IMO anyway.
Anyway, don't fret. Things are good! Who would have thought just 5 or 6 years ago that we would be using machines with EIGHT FRIGGING CPUs in them and 32 GIGS of RAM! Yahoo!
SimD
May 17, 2009, 11:08 PM
So much "own" in this thread. Love it!
I actually am very curious as to HOW certain apps utilize multiple cores.
From what I understand, if we take Logic for example, is that the application divides each tracks into it's individual "core" so that the program can process many tracks at the same time.
Am I way off?
t4cgirl
May 24, 2009, 04:02 AM
Just a reminder -- according to Apple's benchmarks, a 2.8 octo will indeed trounce a 2.23 quad Nehalem on the most common tasks such as: Photoshop, XCode, Final Cut Pro, Cinebench. (even the 2.93 octo Nehalem is only 20% faster.) An octo-nehalem -will- do better on synthetic numerical benchmarks, however -- so if you're looking for prime numbers or cosmological sims, it might be worth the dough.
poobah
May 24, 2009, 09:55 PM
Too much over thinking this.
Once 10.6 comes along and we get a smarter scheduler, more cores are going to rock, and 8 'real' cores should walk over 4 real + 4 virtal.
1st off, typical mac is running dozens of services, intelligently distributing these across the cores will speed up the machines
2nd, multi-threaded apps are not some mystical voodoo. It's not hard to spawn off worker threads to the do the main task's bidding, so long as those tasks can be relatively independent. Consider a typical game. 4 or more threads would not be an uncommon scenario (user input, AI, compositing/render, 3D audio). Not all apps are good candidates for multi threading due to having a serial nature.
The issue *today* is that the thread scheduler is not smart about allocating threads amongst the available cores.
Tesselator
May 24, 2009, 11:28 PM
I think almost every point you made is very incorrect. There remains a tiny TINY bit of truth to each though - maybe. :D
1) Scheduler technology is ancient. 20 years old or more. If Apple and the BSD coders have a scheduler that sucks as bad as you're saying then it's surprising that OS X even works at all.
Physical cores (PCs) are not better than Virtual cores (VCs). 16 VCs will always be faster or the same as 8 PCs. Always! The only downside of VCs is code compatibility. And this is how it should be considered: 8 vs. 16. Comparing 8 VCs with 8 PCs is extremely silly. HT is an additional feature and not a replacement technology for real cores.
2) the services and BG tasks that run on your 8 core MP are already "intelligently distributed" and all of them together utilize less than 5% or 10% from the total 800%. The 2 to 5 percent Apple might be able to improve scheduling in 10.6 will probably not be noticeable and certainly not for applications not specifically written to take advantage of the new facilities.
3) the apps that CAN BE multi-threaded already are multi-threaded. Most applications simply CAN NOT BE multi-threaded. Some code CAN BE multi-threaded but the benefits are not worth the time needed to recode the applications. This is part of the reason that some code flakes out and/or crashes on HT enabled processors too BTW. Until Intel comes up with a scheme where multiple cores share common L1 and L2 caches this will not change. Think of a simple analogue algorithm like locomotion. Your single brain needs to place one foot in front of the other. Having multiple brains working on the task of walking will not speed up walking as the foot behind is time dependent on the foot in front. One event needs to happen and the result examined before the next event can begin. So it is with the majority of applications we use today.
4) The issues today are still the same issues we've always had. Clock Speed is the answer to one question, Multiple Cores is the answer to an entirely different question. Until the architecture of a CPU is fundamentally changed this will remain the same.
Spanky Deluxe
May 25, 2009, 09:16 AM
I think almost every point you made is very incorrect. There remains a tiny TINY bit of truth to each though - maybe. :D
1) Scheduler technology is ancient. 20 years old or more. If Apple and the BSD coders have a scheduler that sucks as bad as you're saying then it's surprising that OS X even works at all.
Physical cores (PCs) are not better than Virtual cores (VCs). 16 VCs will always be faster or the same as 8 PCs. Always! The only downside of VCs is code compatibility. And this is how it should be considered: 8 vs. 16. Comparing 8 VCs with 8 PCs is extremely silly. HT is an additional feature and not a replacement technology for real cores.
2) the services and BG tasks that run on your 8 core MP are already "intelligently distributed" and all of them together utilize less than 5% or 10% from the total 800%. The 2 to 5 percent Apple might be able to improve scheduling in 10.6 will probably not be noticeable and certainly not for applications not specifically written to take advantage of the new facilities.
3) the apps that CAN BE multi-threaded already are multi-threaded. Most applications simply CAN NOT BE multi-threaded. Some code CAN BE multi-threaded but the benefits are not worth the time needed to recode the applications. This is part of the reason that some code flakes out and/or crashes on HT enabled processors too BTW. Until Intel comes up with a scheme where multiple cores share common L1 and L2 caches this will not change. Think of a simple analogue algorithm like locomotion. Your single brain needs to place one foot in front of the other. Having multiple brains working on the task of walking will not speed up walking as the foot behind is time dependent on the foot in front. One event needs to happen and the result examined before the next event can begin. So it is with the majority of applications we use today.
4) The issues today are still the same issues we've always had. Clock Speed is the answer to one question, Multiple Cores is the answer to an entirely different question. Until the architecture of a CPU is fundamentally changed this will remain the same.
Exactly. If chip manufacturers weren't finding it harder and harder to create chips with higher clock speeds then all of this multi-core stuff would never have been so "important to the consumer". They couldn't make chips run much faster so they're using multiple chips to make things faster. Like Tesselator says though, a lot of stuff simply can't be made much faster. Macs in particular have been dual processor for many many years yet the number of OS X programs that can actually fully utilize just two cores is amazingly small.
All the stuff about multicores is mainly due to advertising. If they could have manufactured a 10GHz Core Solo instead of a 2.5GHz Core 2 Quad and had it air cooled then they would have done in a heartbeat.
Just look at this graph from Intel: http://software.intel.com/file/2328
If just 50% of a program's code cannot be parallelised then it cannot run much faster on 8 processors than 2 or 4. Even in highly efficient code where only 20% cannot be parallised, it still doesn't even run that much faster. This is why people often see reasonably impressive jumps when going from one core to two but not much from two to four. You get most of the benefits from multiprocessing (i.e. being able to do more things at once smoothly) with just two cores.
Hopefully some people will start to realise that once 10.6 comes out, nothing will change apart from the fact that Finder will be multithreaded and your average programmer might be able to multithread their code easier if they learn OpenCL. Honestly, that's about it.
300D
May 25, 2009, 09:22 AM
yet the number of OS X programs that can actually fully utilize just two cores is amazingly small.
"Small" meaning "almost everything"?
t0mat0
May 25, 2009, 09:35 AM
Exactly. If chip manufacturers weren't finding it harder and harder to create chips with higher clock speeds then all of this multi-core stuff would never have been so "important to the consumer". They couldn't make chips run much faster so they're using multiple chips to make things faster. Like Tesselator says though, a lot of stuff simply can't be made much faster. Macs in particular have been dual processor for many many years yet the number of OS X programs that can actually fully utilize just two cores is amazingly small.
All the stuff about multicores is mainly due to advertising. If they could have manufactured a 10GHz Core Solo instead of a 2.5GHz Core 2 Quad and had it air cooled then they would have done in a heartbeat.
Just look at this graph from Intel: http://software.intel.com/file/2328
If just 50% of a program's code cannot be parallelised then it cannot run much faster on 8 processors than 2 or 4. Even in highly efficient code where only 20% cannot be parallised, it still doesn't even run that much faster. This is why people often see reasonably impressive jumps when going from one core to two but not much from two to four. You get most of the benefits from multiprocessing (i.e. being able to do more things at once smoothly) with just two cores.
Hopefully some people will start to realise that once 10.6 comes out, nothing will change apart from the fact that Finder will be multithreaded and your average programmer might be able to multithread their code easier if they learn OpenCL. Honestly, that's about it.
Isn't that graph a bit off though? It seems you could think that that all code is equal in terms of how much it will be used when a user is using the program?
Ok, so half a programs code can't be parallelised. Great. but what if the 50% that can, is in areas that the user uses the most? If the code that can be parallelised is in sections that the user will use a lot through using the program, then over an hours usage of the program, the speed up will be larger than assumed from taking the chart at face value.
Wouldn't the user be seeing pockets, or specific areas where there was a large improvement, rather than this being spread out thinly within the program?
Spanky Deluxe
May 25, 2009, 09:44 AM
Isn't that graph a bit off though? It seems you could think that that all code is equal in terms of how much it will be used when a user is using the program?
Ok, so half a programs code can't be parallelised. Great. but what if the 50% that can, is in areas that the user uses the most? If the code that can be parallelised is in sections that the user will use a lot through using the program, then over an hours usage of the program, the speed up will be larger than assumed from taking the chart at face value.
Wouldn't the user be seeing pockets, or specific areas where there was a large improvement, rather than this being spread out thinly within the program?
The graph is for parts of code rather than a full application. Obviously most applications spend most of their time in idle mode. Take a photoshop effect though. If that has parts that can't be parallelised then the speedup would diminish. I'm not saying that multiprocessors aren't good, they're great. Especially if you do very parallel stuff (i.e. if you can't make one dvd encode run any faster on multiple processors than one then you can at least run multiple dvd encodes of different things). So many people seem to think that Snow Leopard = Multiprocessors will be much much much faster at everything under the Sun when, in reality, very little's changed.
nanofrog
May 25, 2009, 02:00 PM
The graph is for parts of code rather than a full application. Obviously most applications spend most of their time in idle mode. Take a photoshop effect though. If that has parts that can't be parallelised then the speedup would diminish. I'm not saying that multiprocessors aren't good, they're great. Especially if you do very parallel stuff (i.e. if you can't make one dvd encode run any faster on multiple processors than one then you can at least run multiple dvd encodes of different things). So many people seem to think that Snow Leopard = Multiprocessors will be much much much faster at everything under the Sun when, in reality, very little's changed.
Nice summation. :)
OSXconvert
May 27, 2009, 12:02 AM
This is a great discussion about a reality check of what to expect with multithreading in 10.6. The 2.8 octo may in beat most of the new nehalems under Leopard now, but how do we know that Apple will not cripple the SL code for the older chips? Apple is a business after all and they're trying to gain marketshare from windows computers running the latest Intel chips. They're biggest incentive is to code for the current processors and the next generation ones so that people will buy new computers. Sure an install of SL will probably speed up an old 2.8 octo, but my bet is that all the optimizations will be made to take advantage of the latest chips.
Spanky Deluxe
May 27, 2009, 07:31 AM
This is a great discussion about a reality check of what to expect with multithreading in 10.6. The 2.8 octo may in beat most of the new nehalems under Leopard now, but how do we know that Apple will not cripple the SL code for the older chips? Apple is a business after all and they're trying to gain marketshare from windows computers running the latest Intel chips. They're biggest incentive is to code for the current processors and the next generation ones so that people will buy new computers. Sure an install of SL will probably speed up an old 2.8 octo, but my bet is that all the optimizations will be made to take advantage of the latest chips.
Nah don't worry about that. Pretty much any optimisations that they can do will be backwards compatible with the previous chips anyway - i.e. better use of SSE(insertnumberhere) instructions. Besides which, if they did purposefully kill the code for pre-core environments then it would kill performance on their entire current line-up besides the Mac Pro. Also, it would be blazingly obvious that they did sabotage the code because suddenly SL machines would be running slower than not only Leopard but also Windows 7 - this would be a terrible thing for marketing.
The 2.8GHz previous model is definitely the best value for money right now. The only thing the 2.23GHz Nehalem has that the 2.8GHz doesn't have is Hyperthreading but Hyperthreading's been the biggest pile of doggydoodoos since it was first shoved into Pentium 4s in 2002.
Do you know, a little known fact, Hyper Threading support has actually been in OS X since before official Intel machines were released? The Intel Developer Transition Kits, with the first Intel versions of Tiger on, were built upon P4 CPUs and they had Hyper Threading. Tiger reported four CPUs back then too.
nanofrog
May 27, 2009, 12:59 PM
Nah don't worry about that. Pretty much any optimisations that they can do will be backwards compatible with the previous chips anyway - i.e. better use of SSE(insertnumberhere) instructions. Besides which, if they did purposefully kill the code for pre-core environments then it would kill performance on their entire current line-up besides the Mac Pro. Also, it would be blazingly obvious that they did sabotage the code because suddenly SL machines would be running slower than not only Leopard but also Windows 7 - this would be a terrible thing for marketing.
The 2.8GHz previous model is definitely the best value for money right now. The only thing the 2.23GHz Nehalem has that the 2.8GHz doesn't have is Hyperthreading but Hyperthreading's been the biggest pile of doggydoodoos since it was first shoved into Pentium 4s in 2002.
Do you know, a little known fact, Hyper Threading support has actually been in OS X since before official Intel machines were released? The Intel Developer Transition Kits, with the first Intel versions of Tiger on, were built upon P4 CPUs and they had Hyper Threading. Tiger reported four CPUs back then too.
But what's been changed code wise since the crud known as the P4 was released?
It's been awhile, and they may have actually gotten it right this time around. ;)
t0mat0
May 27, 2009, 06:23 PM
How's about some place "your money where your mouth is" style speculation?
What apps would people want benchmarking to compare the 2 rigs (Octocore Harpertown standard configuration and Quadcore Nehalem) on 10.5.7 vs 10.6 ?
With some suggestions agreed on, it would be interesting to see those saying there won't be any changes/minimal and also those saying big improvements coming, and also a rough performance change % 10.5 vs 10.6 for both machines.
Spanky Deluxe
May 27, 2009, 06:55 PM
How's about some place "your money where your mouth is" style speculation?
What apps would people want benchmarking to compare the 2 rigs (Octocore Harpertown standard configuration and Quadcore Nehalem) on 10.5.7 vs 10.6 ?
With some suggestions agreed on, it would be interesting to see those saying there won't be any changes/minimal and also those saying big improvements coming, and also a rough performance change % 10.5 vs 10.6 for both machines.
Things like CS4 benchmarks, Pro Tools stuff, MP3/MP4 encoding, Cinebench and Geekbench for the hell of it, anything else anyone can think of that reflects the kind of usage Mac Pro users do.
I'm sure this kind of stuff will become clear when Snow Leopard comes out. We may well know the answers to some of this stuff if Snow Leopard goes gold master for WWDC and developers get their final copies - of course that's only if Snow Leopard is already done.
My predictions are that Snow Leopard will perform within 5% of Leopard in initial benchmarks. That's for both Nehalem and Core 2 Duo processors.
akdj
May 27, 2009, 06:57 PM
Tesselator,
Very interesting reading your information on MT/HT technology. I do remember almost 7 years ago, getting my Pentium PC with HT...pretty proud they were, I think I paid a 10-15% premium at the time.
I am curious though, with Snow Leopard and Windows 7 on the way, will software developers be able to code their games/applications to take advantage of BOTH the main CPU's and the GPU's? I have read some on this recently and it seems like a great idea, especially when the graphics system isn't being taxed and the CPUs are. Just curious, using your analogy as far as locomotion, it seems irrelevant where the extra cores or processors are, the architecture needs to change to take advantage.
I must admit, I was (I guess I still am, with my experience with HT 7 years ago) optimistic as well with Snow Leopard on the horizon, to take advantage of these multi core machines. Makes me less concerned about missing anything with the last generation 3.0gx8. All I do is video/audio/photo editing and graphic layout with our Mac (business). It is always a write off, but sense has to be made to spend more money. Faster rendering, less waiting, quicker and more efficient software always means more time to make more money:)
Is there a way to take advantage of these technologies or is it all HYPE? I know the integration of the caches to the CPU's and better/faster and more efficient memory (RAM and quicker seek times on the HD's) will improve as time moves forward, but are we capped with CPU (horse) power for the time being? Are these speed improvements only attributable to the memory systems, mother boards, faster hard drives, etc?
Is does seem like a snake oil pitch, the multi core, Hyper threading, virtual core sch-peel from the chip manufacturers, if what you say is true (and I have no reason to doubt you).
J
poobah
May 28, 2009, 12:36 AM
I think almost every point you made is very incorrect. There remains a tiny TINY bit of truth to each though - maybe. :D
You didn't really address my points (which I maintain are accurate), but I'll take a stab at the strawmen you replaced them with... :D
1) Scheduler technology is ancient. 20 years old or more. If Apple and the BSD coders have a scheduler that sucks as bad as you're saying then it's surprising that OS X even works at all.
Actually, its older than that. Age isn't the issue. Hardware has changed drastically over the years, and the scheduler algorithms have to catch up. Ideally, you'd schedule nehalem differently than a P4. Multiple cores/CPUs on a desktop machine are still a relatively new concept, and the software has to catch up, it's just life.
Physical cores (PCs) are not better than Virtual cores (VCs). 16 VCs will always be faster or the same as 8 PCs. Always! The only downside of VCs is code compatibility. And this is how it should be considered: 8 vs. 16. Comparing 8 VCs with 8 PCs is extremely silly. HT is an additional feature and not a replacement technology for real cores.
I said that 8 real cores would beat 4 real + 4 virtual. That will always be true all other things being equal. The HT cores share pieces with the "real" cores, there will *always* be contention that doesn't happen with distinct cores. Otherwise, I agree.
2) the services and BG tasks that run on your 8 core MP are already "intelligently distributed" and all of them together utilize less than 5% or 10% from the total 800%. The 2 to 5 percent Apple might be able to improve scheduling in 10.6 will probably not be noticeable and certainly not for applications not specifically written to take advantage of the new facilities.
The kernel didn't get thread load balancing until Leopard, it's still a 'new thing' for our Macs. It's not really the couple percent they use idling, it's the context swaps that happen when your app has to share with 400ish system threads. Carving them up in a more thought out manner will help tremendously. Building your system from the start with an eye toward multi-core operation is going to be more efficient than bolting it on later.
3) the apps that CAN BE multi-threaded already are multi-threaded. Most applications simply CAN NOT BE multi-threaded. Some code CAN BE multi-threaded but the benefits are not worth the time needed to recode the applications. This is part of the reason that some code flakes out and/or crashes on HT enabled processors too BTW. Until Intel comes up with a scheme where multiple cores share common L1 and L2 caches this will not change. Think of a simple analogue algorithm like locomotion. Your single brain needs to place one foot in front of the other. Having multiple brains working on the task of walking will not speed up walking as the foot behind is time dependent on the foot in front. One event needs to happen and the result examined before the next event can begin. So it is with the majority of applications we use today.
Some applications work well multi-threaded, others do not, no argument there. The point was that for appropriate apps, writing multi-threaded code is not orders of magnitude more difficult. There are plenty of single threaded apps still out there that could benefit greatly from a re-write with a multi-threaded approach. (and yes, some others never will)
4) The issues today are still the same issues we've always had. Clock Speed is the answer to one question, Multiple Cores is the answer to an entirely different question. Until the architecture of a CPU is fundamentally changed this will remain the same.Different approaches, certainly, but there is quite a bit of overlap, You just have to change the way you attack the problem, and choose which ones will benefit most. True, some things can only go faster if the CPU is faster, but many things can go faster in parallel.
And yes, I'd much rather have a 20 GHz CPU than 8 x 2.5, but that's not likely to happen anytime soon, So I'll be happy with what I have :D
t0mat0
May 28, 2009, 03:28 AM
9to5Mac's reporting (http://www.9to5mac.com/FCP3-real-time-hd-editing) that Final Cut Pro Studio 7 may allow realtime editing of 1080P H.264 video.
I'd call that >5% performance increase. Time to start factoring in the possible effect of GPU when comparing the Octo and the Quad when using 10.6 vs 10.5? Isn't a system wide change, but we might soon see some rough levels for specific speed enhancements where SL can shine.
Tesselator
May 28, 2009, 03:34 AM
"Small" meaning "almost everything"?
No, he said "fully utilize". If that's even 75% then the percentage of apps is indeed very small. 3% ~ 5% if we radically exaggerate.
Tesselator
May 28, 2009, 05:07 AM
Tesselator,
Very interesting reading your information on MT/HT technology. I do remember almost 7 years ago, getting my Pentium PC with HT...pretty proud they were, I think I paid a 10-15% premium at the time.
I am curious though, with Snow Leopard and Windows 7 on the way, will software developers be able to code their games/applications to take advantage of BOTH the main CPU's and the GPU's?
Beats me. It looks like some will. I assume most won't. A lot depends on the framework and tools available I guess.
I have read some on this recently and it seems like a great idea, especially when the graphics system isn't being taxed and the CPUs are. Just curious, using your analogy as far as locomotion, it seems irrelevant where the extra cores or processors are, the architecture needs to change to take advantage.
Yeah, I was speaking in general truths. Another thing to consider is that as NEW apps are created their internal architecture can MUCH more easily be fitted to the new concepts - multiple computing resources. For example if PS was to be created today would the developers treat the API in the same way or would they choose a different internal structure where the internal processing pipeline was created with multi-recourse computing in mind. Currently it's not and if they change it what are the pros and cons? How many 3rd party developers will have to rewrite their products from the ground up? How much increased speed will actually be achieved? Etc. As it is right this minute the answers to those questions do not warrant a rewriting of PS. The same is probably true for many applications. With OpenCL and "better schedulers" (whatever that might mean) this may change but I remain a skeptic for the most part.
I must admit, I was (I guess I still am, with my experience with HT 7 years ago) optimistic as well with Snow Leopard on the horizon, to take advantage of these multi core machines. Makes me less concerned about missing anything with the last generation 3.0gx8. All I do is video/audio/photo editing and graphic layout with our Mac (business). It is always a write off, but sense has to be made to spend more money. Faster rendering, less waiting, quicker and more efficient software always means more time to make more money:)
Is there a way to take advantage of these technologies or is it all HYPE?
Wait and see is the only reasonable answer to that. It's certainly not all hype. But usually the hype is much greater than the reality. This is the computing industry. They need to hype to generate expectations and excitement in order to sell their goods.
I know the integration of the caches to the CPU's and better/faster and more efficient memory (RAM and quicker seek times on the HD's) will improve as time moves forward, but are we capped with CPU (horse) power for the time being? Are these speed improvements only attributable to the memory systems, mother boards, faster hard drives, etc?
No I don't think we're "capped". And when one technology reaches it's limits another will be introduced. There are already several ready and waiting. But they will wait a bit longer. ;) Remember these are companies we're talking about and their goals are maximum yield from the least amount of effort - as it is with any publicly held company or corporation. Also is to consider that there are powers in the status quo that do NOT want equal playing fields. For example we (the US and UK) have long limited what technologies we will allow to be exported. The same powers do indeed limit what we're "allowed" to have and have access to. Some of the quote-unquote nutty speculations you hear about what the government has in secret are true. I saw the work that was done on the "StarWars" project in the early 60's and lat 50's. This wasn't introduced for public awareness until the mid-80's. some 25 years later. And top NASA and University scientists across the western world all said there was no such thing and we wouldn't be able to achieve it for many years to come. LOL They were saying these things all the while it already existed - I know 1st hand. But I do digress. :)
Is does seem like a snake oil pitch, the multi core, Hyper threading, virtual core sch-peel from the chip manufacturers, if what you say is true (and I have no reason to doubt you).
I dunno. Is it snake-oil if it's 5% true? About 10% or 20%? There's some truth in it for sure. How much is the question. Since this is exactly the same technology already introduced some (as you say) 7 years ago then we should already know pretty much what to expect. I don't believe it will be wildly different. Right? First we had single processors, then we had multiple processors, then we had HT both in single and multi-processor systems, then multi-core processors, and now multi-core with HT. At each point along the way the OS's and apps had to be retuned for the new architecture. We're on the edge of seeing what this round or retuning will be like. I say it won't be much different than the retuning we saw for the first round of HT - which was "better" but nothing astounding or ground-breaking.
You didn't really address my points (which I maintain are accurate), but I'll take a stab at the strawmen you replaced them with... :D
Huh? Your whole "point" was based on pure fiction unless you're beta-testing 10.6 and know something we don't - and even then it would still be partial fiction as the products and developments you're making preemptive claims about don't exist or haven't been released yet. So you're coming from pure speculation in the first place. That's cool - I like to speculate - but we can't claim our speculations are absolutely accurate. Call a spade a spade. :)
Actually, its older than that. Age isn't the issue. Hardware has changed drastically over the years, and the scheduler algorithms have to catch up. Ideally, you'd schedule nehalem differently than a P4. Multiple cores/CPUs on a desktop machine are still a relatively new concept, and the software has to catch up, it's just life.
Age is the issue in that maturity comes with age. Code and architecture maturity. Multi-processors have been around and very common for 25 years that I know of. Windows NT 3 had affinity settings which would allow up to 16 processors. I got my first dual in the EARLY 80's, multi-resource computing is MUCH older than that. It not relatively new at all unless you wish to compare mechanical computers from the 16th and 17th centuries. Multi-processor desktops have existed pretty much since there were desk-tops and their popularity increased right along side main-stream electronic computing.
I said that 8 real cores would beat 4 real + 4 virtual. That will always be true all other things being equal. The HT cores share pieces with the "real" cores, there will *always* be contention that doesn't happen with distinct cores. Otherwise, I agree.
Yeah, I know what you said. But comparing the same number of VCs with that number of PCs isn't much fun and really not fair. It's a feature of a processor and not a replacement for other processors. Sometimes it's an advantage, sometimes not, and sometimes it's a disadvantage - when we consider stability issues.
The kernel didn't get thread load balancing until Leopard, it's still a 'new thing' for our Macs. It's not really the couple percent they use idling, it's the context swaps that happen when your app has to share with 400ish system threads. Carving them up in a more thought out manner will help tremendously. Building your system from the start with an eye toward multi-core operation is going to be more efficient than bolting it on later.
I'm not sure what you're talking about but Mac OS has had scheduling and migration with dynamic task assignment since Mac OS had Multi-tasking. You really can't have one without the other. And those are "thread load balancing" so you've lost me.
Some applications work well multi-threaded, others do not, no argument there. The point was that for appropriate apps, writing multi-threaded code is not orders of magnitude more difficult. There are plenty of single threaded apps still out there that could benefit greatly from a re-write with a multi-threaded approach. (and yes, some others never will)
No one said it was always too difficult or even always very difficult. I brought up the fact that it's often not worth it as the code will either suffer poorer execution speeds or not enough speed increase will be realized to justify the effort. It's simple matter of company cost to profit analysis. And of course the fact remains that most of the applications we use today simply cannot be multi-threaded in any significant way. They need the result from operation one in order to calculate operation two. Simple as that.
I'd much rather have a 20 GHz CPU than 8 x 2.5, but that's not likely to happen anytime soon, So I'll be happy with what I have :D
Yes, until we can do for ourselves we have to be satisfied with (or at least accept) what we're handed - or do without. :D
Isn't that graph a bit off though?
It can be correctly assumed to be "off" in either direction depending on the application base you're testing. For example if you're only testing rendering engines then it scales almost linearly. Eight processors or physical cores will be 750% ~ 800% faster than a single core/processor. In the opposite direction if you test apps that cannot be or still are not multi-threaded at all then the graph will look almost completely flat where 8 cores are relatively the same speed as a single core. How they came by their numbers I dunno but I can see some truth in it.
poobah
May 29, 2009, 12:44 AM
Huh? Your whole "point" was based on pure fiction unless you're beta-testing 10.6 and know something we don't - and even then it would still be partial fiction as the products and developments you're making preemptive claims about don't exist or haven't been released yet. So you're coming from pure speculation in the first place. That's cool - I like to speculate - but we can't claim our speculations are absolutely accurate. Call a spade a spade. :)
Yes, speculation, but based on the published information regarding SL and Leopard's respective schedulers. My sense is that there will be a considerable performance bump (say 5-10%) on the same code, and that's before we account for 64bit goodness. And hey, isn't this arm-chair quarterbacking fun? :)
Age is the issue in that maturity comes with age. Code and architecture maturity. Multi-processors have been around and very common for 25 years that I know of. Windows NT 3 had affinity settings which would allow up to 16 processors. I got my first dual in the EARLY 80's, multi-resource computing is MUCH older than that. It not relatively new at all unless you wish to compare mechanical computers from the 16th and 17th centuries. Multi-processor desktops have existed pretty much since there were desk-tops and their popularity increased right along side main-stream electronic computing.
Today's multi-cpu architectures are vastly different than even 5 or 10 years ago. An Intel Paragon, a dual-P5, and a dual QuadCore Nehalem are Totally different animals. New scheduler stuff is constantly being researched to leverage the new architectures and capabilities of modern multi-core CPUs. Maturity != efficiency. Seriously, the car is what, over 100 years old, and we still power them with liquified reptile remains.
Yeah, I know what you said. But comparing the same number of VCs with that number of PCs isn't much fun and really not fair.
It *is* the thread topic :D
I'm not sure what you're talking about but Mac OS has had scheduling and migration with dynamic task assignment since Mac OS had Multi-tasking. You really can't have one without the other. And those are "thread load balancing" so you've lost me.
Prior to Leopard, the scheduler didn't diverge much from the CMU Mach scheduler (that goes back to the mid-90's). Leopard included many updates, most notably better load balancing and a primitive cpu affinity mechanism. The lack of a (good) affinity system can really hurt performance (cache thrashing, etc). Additionally, the scheduler has no provisions for asymmetric core capabilities. The prevailing wisdom is that SL will utilize some of the concepts from the FreeBSD ULE scheduler (excellent paper on it here (http://www.scribd.com/doc/3299978/ULE-Thread-Scheduler-for-FreeBSD)). The ULE scheduler seems likely, given the advent of OpenCL, and the desire to run some tasks on a GPU core in SL... That gives us at least 3 different core types (Real CPU, HT and GPU) and capability sets.
No one said it was always too difficult or even always very difficult. I brought up the fact that it's often not worth it as the code will either suffer poorer execution speeds or not enough speed increase will be realized to justify the effort. It's simple matter of company cost to profit analysis. And of course the fact remains that most of the applications we use today simply cannot be multi-threaded in any significant way. They need the result from operation one in order to calculate operation two. Simple as that.
IMHO, you are thinking too large. Most apps are linear in that they are waiting for user input, but once the user instructs the app to do something, multi-tasking can shine (consider the embarrassingly parallel photoshop filter task). At any rate, my only claim is that multi-threaded programming isn't really all that hard if you can wrap your brain around it. I personally enjoy it.
Tesselator
May 29, 2009, 02:22 AM
And hey, isn't this arm-chair quarterbacking fun? :)
Yes. :)
It *is* the thread topic :D
I didn't think so. I thought it was "(octo-harpertown Vs quad-nehalem) + 10.6 = ???" and "Can anyone help me figure out grand central? Is there even enough info on GC yet?" Nothing specifically about 8 VCs vs. 8 PCs.
Prior to Leopard, the scheduler didn't diverge much from the CMU Mach scheduler (that goes back to the mid-90's). Leopard included many updates, most notably better load balancing and a primitive cpu affinity mechanism. The lack of a (good) affinity system can really hurt performance (cache thrashing, etc).
Affinity is a super simple mechanism and does not curb nor enhance the persistent load-balancing problem. Affinity becomes slightly more complicated with multiple hyper-threaded multi-core processors due to the complete affinity that exists between the VCs of the same core and the partial affinity that exists between the physical cores of the same processor chip. Any affect Affinity has is only due to cache repopulation and we're probably talking about such minute performance differences that it would be nearly impossible to detect for the average user running bench marking programs. NetBSD has the very best scheduling and affinity of any popular OS AFAIK and Rhapsody (Ah-hem, OS X) is based on NetBSD AFAIK so unless Apple threw out excellent code and replaced it with juvenile crap OS X is already very very good and following the NetBSD developments would provide Apple with any new schedular features etc..
Multi-tasking in OS X (ever since they went with Intel chips and NetBSD preemptive multitasking) is excellent although classic applications running cooperatively multitasked (under Mac OS 9 running as OS X process) can be pretty crappy.
Additionally, the scheduler has no provisions for asymmetric core capabilities. The prevailing wisdom is that SL will utilize some of the concepts from the FreeBSD ULE scheduler (excellent paper on it here (http://www.scribd.com/doc/3299978/ULE-Thread-Scheduler-for-FreeBSD)). The ULE scheduler seems likely, given the advent of OpenCL, and the desire to run some tasks on a GPU core in SL... That gives us at least 3 different core types (Real CPU, HT and GPU) and capability sets.
Yeah, I didn't think of GPU. But you're saying that Apple did indeed throw out good code for crap or something? Or are you saying that ULE like "evolutionary next step" code will be added in turn as Apple has been doing all along?
Most apps are linear in that they are waiting for user input, but once the user instructs the app to do something, multi-tasking can shine (consider the embarrassingly parallel photoshop filter task). At any rate, my only claim is that multi-threaded programming isn't really all that hard if you can wrap your brain around it. I personally enjoy it.
I agree with that. But rewriting any code has an inherent implied cost. If the code or app in question needs to architecturally or fundamentally change then the cost would be very high. The results may not be worth the effort if the operations do not scale well. Anyone can answer this for themselves: Spending a million or so in development costs in order to squeeze 2% or even 5% speed increase? Especially if the app is already relatively in-line (speed-wise) with other applications and customer satisfaction is relatively high? Would you do it? I think the answer is obvious myself and so I assume many developers will not be rushing in to rewrite their applications. If on the other hand, the application scales phenomenally well then chances are that it already is designed with multi-threading in mind. All in all, not much will change. At least that's what I'm thinking. :D
PS: It actually sounds to me besides a few minor points, that we are in agreement. About the only real difference is that you're saying 5% ~ 10% and I'm saying 0% ~ 5% as an average performance increase. :p
poobah
May 29, 2009, 10:35 PM
Yeah, I didn't think of GPU. But you're saying that Apple did indeed throw out good code for crap or something? Or are you saying that ULE like "evolutionary next step" code will be added in turn as Apple has been doing all along?
Pretty much. I see the ULE stuff as a bigger step, but certainly an evolution, not a replacement.
On affinity, the pre-leopard scheduler wasn't smart about grouping related threads on the same core to avoid thrashing the caches. None of this stuff is more than a (few)percent performance here or there, but in aggregate it can be significant.
PS: It actually sounds to me besides a few minor points, that we are in agreement. About the only real difference is that you're saying 5% ~ 10% and I'm saying 0% ~ 5% as an average performance increase. :p
You know how the intar-web works... two people slug it out and after a couple weeks realize they have the exact same viewpoint :p
5-10% ain't bad for a software rev. If the 64bit stuff is done well, we'll probably get a bigger bump from all 64bit-ness than the 64/32bit mix we have now. That, and if they do have to re-write pieces to get to 64 bits, its a good time to examine the code multi-threaded goodness :D
t0mat0
Jun 9, 2009, 10:52 AM
Any takers on the discussion now? Seems Apple has put a decent amount of code refactoring to get OpenCL, GCD implemented into core apps, and improvements here there and everywhere. Anyone seen any benchmarks or impressions from the new Developer Preview?
kbmb
Jun 9, 2009, 11:38 AM
Any takers on the discussion now? Seems Apple has put a decent amount of code refactoring to get OpenCL, GCD implemented into core apps, and improvements here there and everywhere. Anyone seen any benchmarks or impressions from the new Developer Preview?
I'd be interested in any new developments now that devs have a near-final version of Snow Leopard as well.
Also, can anyone comment on how Snow Leopard will perform in general with lower spec'd systems?
For example, I have a 2006 Mac Pro 2.66 with the 2 Dual cores and the 7300GT. I know the video card is not supported for OpenCL (neither is the x3100 in my 2008 Macbook :(), but I'm guessing that overall Snow Leopard will run at the same level or slightly better than Leopard does today on these machines?
Even without OpenCL suppport, both these machines have multiple processors (or cores), so I'm guessing that once apps are re-written to use GCD, even these slightly older machines will see some benefits?
-Kevin
vBulletin® v3.6.10, Copyright ©2000-2009, Jelsoft Enterprises Ltd.