(octo-harpertown Vs quad-nehalem) + 10.6 = ???

Tesselator · May 11, 2009

MadisonTate said:
.

Last edited by MadisonTate : Yesterday at 03:22 PM. Reason: Offensive?

Don't worry about it bro! I didn't think it was offensive. It's all guud. Sometimes we discuss in declarative terms. That's just human and part of English - IMO anyway.

Anyway, don't fret. Things are good! Who would have thought just 5 or 6 years ago that we would be using machines with EIGHT FRIGGING CPUs in them and 32 GIGS of RAM! Yahoo!

SimD · May 17, 2009

So much "own" in this thread. Love it!

I actually am very curious as to HOW certain apps utilize multiple cores.

From what I understand, if we take Logic for example, is that the application divides each tracks into it's individual "core" so that the program can process many tracks at the same time.

Am I way off?

t4cgirl · May 24, 2009

Just a reminder -- according to Apple's benchmarks, a 2.8 octo will indeed trounce a 2.23 quad Nehalem on the most common tasks such as: Photoshop, XCode, Final Cut Pro, Cinebench. (even the 2.93 octo Nehalem is only 20% faster.) An octo-nehalem -will- do better on synthetic numerical benchmarks, however -- so if you're looking for prime numbers or cosmological sims, it might be worth the dough.

poobah · May 24, 2009

multithreading

Too much over thinking this.

Once 10.6 comes along and we get a smarter scheduler, more cores are going to rock, and 8 'real' cores should walk over 4 real + 4 virtal.

1st off, typical mac is running dozens of services, intelligently distributing these across the cores will speed up the machines

2nd, multi-threaded apps are not some mystical voodoo. It's not hard to spawn off worker threads to the do the main task's bidding, so long as those tasks can be relatively independent. Consider a typical game. 4 or more threads would not be an uncommon scenario (user input, AI, compositing/render, 3D audio). Not all apps are good candidates for multi threading due to having a serial nature.

The issue *today* is that the thread scheduler is not smart about allocating threads amongst the available cores.

Tesselator · May 24, 2009

I think almost every point you made is very incorrect. There remains a tiny TINY bit of truth to each though - maybe.

1) Scheduler technology is ancient. 20 years old or more. If Apple and the BSD coders have a scheduler that sucks as bad as you're saying then it's surprising that OS X even works at all.

Physical cores (PCs) are not better than Virtual cores (VCs). 16 VCs will always be faster or the same as 8 PCs. Always! The only downside of VCs is code compatibility. And this is how it should be considered: 8 vs. 16. Comparing 8 VCs with 8 PCs is extremely silly. HT is an additional feature and not a replacement technology for real cores.

2) the services and BG tasks that run on your 8 core MP are already "intelligently distributed" and all of them together utilize less than 5% or 10% from the total 800%. The 2 to 5 percent Apple might be able to improve scheduling in 10.6 will probably not be noticeable and certainly not for applications not specifically written to take advantage of the new facilities.

3) the apps that CAN BE multi-threaded already are multi-threaded. Most applications simply CAN NOT BE multi-threaded. Some code CAN BE multi-threaded but the benefits are not worth the time needed to recode the applications. This is part of the reason that some code flakes out and/or crashes on HT enabled processors too BTW. Until Intel comes up with a scheme where multiple cores share common L1 and L2 caches this will not change. Think of a simple analogue algorithm like locomotion. Your single brain needs to place one foot in front of the other. Having multiple brains working on the task of walking will not speed up walking as the foot behind is time dependent on the foot in front. One event needs to happen and the result examined before the next event can begin. So it is with the majority of applications we use today.

4) The issues today are still the same issues we've always had. Clock Speed is the answer to one question, Multiple Cores is the answer to an entirely different question. Until the architecture of a CPU is fundamentally changed this will remain the same.

Spanky Deluxe · May 25, 2009

Tesselator said:
I think almost every point you made is very incorrect. There remains a tiny TINY bit of truth to each though - maybe.

1) Scheduler technology is ancient. 20 years old or more. If Apple and the BSD coders have a scheduler that sucks as bad as you're saying then it's surprising that OS X even works at all.

Physical cores (PCs) are not better than Virtual cores (VCs). 16 VCs will always be faster or the same as 8 PCs. Always! The only downside of VCs is code compatibility. And this is how it should be considered: 8 vs. 16. Comparing 8 VCs with 8 PCs is extremely silly. HT is an additional feature and not a replacement technology for real cores.

2) the services and BG tasks that run on your 8 core MP are already "intelligently distributed" and all of them together utilize less than 5% or 10% from the total 800%. The 2 to 5 percent Apple might be able to improve scheduling in 10.6 will probably not be noticeable and certainly not for applications not specifically written to take advantage of the new facilities.

3) the apps that CAN BE multi-threaded already are multi-threaded. Most applications simply CAN NOT BE multi-threaded. Some code CAN BE multi-threaded but the benefits are not worth the time needed to recode the applications. This is part of the reason that some code flakes out and/or crashes on HT enabled processors too BTW. Until Intel comes up with a scheme where multiple cores share common L1 and L2 caches this will not change. Think of a simple analogue algorithm like locomotion. Your single brain needs to place one foot in front of the other. Having multiple brains working on the task of walking will not speed up walking as the foot behind is time dependent on the foot in front. One event needs to happen and the result examined before the next event can begin. So it is with the majority of applications we use today.

4) The issues today are still the same issues we've always had. Clock Speed is the answer to one question, Multiple Cores is the answer to an entirely different question. Until the architecture of a CPU is fundamentally changed this will remain the same.

Exactly. If chip manufacturers weren't finding it harder and harder to create chips with higher clock speeds then all of this multi-core stuff would never have been so "important to the consumer". They couldn't make chips run much faster so they're using multiple chips to make things faster. Like Tesselator says though, a lot of stuff simply can't be made much faster. Macs in particular have been dual processor for many many years yet the number of OS X programs that can actually fully utilize just two cores is amazingly small.

All the stuff about multicores is mainly due to advertising. If they could have manufactured a 10GHz Core Solo instead of a 2.5GHz Core 2 Quad and had it air cooled then they would have done in a heartbeat.

Just look at this graph from Intel:

If just 50% of a program's code cannot be parallelised then it cannot run much faster on 8 processors than 2 or 4. Even in highly efficient code where only 20% cannot be parallised, it still doesn't even run that much faster. This is why people often see reasonably impressive jumps when going from one core to two but not much from two to four. You get most of the benefits from multiprocessing (i.e. being able to do more things at once smoothly) with just two cores.

Hopefully some people will start to realise that once 10.6 comes out, nothing will change apart from the fact that Finder will be multithreaded and your average programmer might be able to multithread their code easier if they learn OpenCL. Honestly, that's about it.

300D · May 25, 2009

Spanky Deluxe said:
yet the number of OS X programs that can actually fully utilize just two cores is amazingly small.

"Small" meaning "almost everything"?

t0mat0 · May 25, 2009

Spanky Deluxe said:
Exactly. If chip manufacturers weren't finding it harder and harder to create chips with higher clock speeds then all of this multi-core stuff would never have been so "important to the consumer". They couldn't make chips run much faster so they're using multiple chips to make things faster. Like Tesselator says though, a lot of stuff simply can't be made much faster. Macs in particular have been dual processor for many many years yet the number of OS X programs that can actually fully utilize just two cores is amazingly small.

All the stuff about multicores is mainly due to advertising. If they could have manufactured a 10GHz Core Solo instead of a 2.5GHz Core 2 Quad and had it air cooled then they would have done in a heartbeat.

Just look at this graph from Intel:

If just 50% of a program's code cannot be parallelised then it cannot run much faster on 8 processors than 2 or 4. Even in highly efficient code where only 20% cannot be parallised, it still doesn't even run that much faster. This is why people often see reasonably impressive jumps when going from one core to two but not much from two to four. You get most of the benefits from multiprocessing (i.e. being able to do more things at once smoothly) with just two cores.

Hopefully some people will start to realise that once 10.6 comes out, nothing will change apart from the fact that Finder will be multithreaded and your average programmer might be able to multithread their code easier if they learn OpenCL. Honestly, that's about it.

Isn't that graph a bit off though? It seems you could think that that all code is equal in terms of how much it will be used when a user is using the program?

Ok, so half a programs code can't be parallelised. Great. but what if the 50% that can, is in areas that the user uses the most? If the code that can be parallelised is in sections that the user will use a lot through using the program, then over an hours usage of the program, the speed up will be larger than assumed from taking the chart at face value.
Wouldn't the user be seeing pockets, or specific areas where there was a large improvement, rather than this being spread out thinly within the program?

Spanky Deluxe · May 25, 2009

t0mat0 said:
Isn't that graph a bit off though? It seems you could think that that all code is equal in terms of how much it will be used when a user is using the program?

Ok, so half a programs code can't be parallelised. Great. but what if the 50% that can, is in areas that the user uses the most? If the code that can be parallelised is in sections that the user will use a lot through using the program, then over an hours usage of the program, the speed up will be larger than assumed from taking the chart at face value.
Wouldn't the user be seeing pockets, or specific areas where there was a large improvement, rather than this being spread out thinly within the program?

The graph is for parts of code rather than a full application. Obviously most applications spend most of their time in idle mode. Take a photoshop effect though. If that has parts that can't be parallelised then the speedup would diminish. I'm not saying that multiprocessors aren't good, they're great. Especially if you do very parallel stuff (i.e. if you can't make one dvd encode run any faster on multiple processors than one then you can at least run multiple dvd encodes of different things). So many people seem to think that Snow Leopard = Multiprocessors will be much much much faster at everything under the Sun when, in reality, very little's changed.

nanofrog · May 25, 2009

Spanky Deluxe said:
The graph is for parts of code rather than a full application. Obviously most applications spend most of their time in idle mode. Take a photoshop effect though. If that has parts that can't be parallelised then the speedup would diminish. I'm not saying that multiprocessors aren't good, they're great. Especially if you do very parallel stuff (i.e. if you can't make one dvd encode run any faster on multiple processors than one then you can at least run multiple dvd encodes of different things). So many people seem to think that Snow Leopard = Multiprocessors will be much much much faster at everything under the Sun when, in reality, very little's changed.

Nice summation.

OSXconvert · May 26, 2009

This is a great discussion about a reality check of what to expect with multithreading in 10.6. The 2.8 octo may in beat most of the new nehalems under Leopard now, but how do we know that Apple will not cripple the SL code for the older chips? Apple is a business after all and they're trying to gain marketshare from windows computers running the latest Intel chips. They're biggest incentive is to code for the current processors and the next generation ones so that people will buy new computers. Sure an install of SL will probably speed up an old 2.8 octo, but my bet is that all the optimizations will be made to take advantage of the latest chips.

Spanky Deluxe · May 27, 2009

OSXconvert said:
This is a great discussion about a reality check of what to expect with multithreading in 10.6. The 2.8 octo may in beat most of the new nehalems under Leopard now, but how do we know that Apple will not cripple the SL code for the older chips? Apple is a business after all and they're trying to gain marketshare from windows computers running the latest Intel chips. They're biggest incentive is to code for the current processors and the next generation ones so that people will buy new computers. Sure an install of SL will probably speed up an old 2.8 octo, but my bet is that all the optimizations will be made to take advantage of the latest chips.

Nah don't worry about that. Pretty much any optimisations that they can do will be backwards compatible with the previous chips anyway - i.e. better use of SSE(insertnumberhere) instructions. Besides which, if they did purposefully kill the code for pre-core environments then it would kill performance on their entire current line-up besides the Mac Pro. Also, it would be blazingly obvious that they did sabotage the code because suddenly SL machines would be running slower than not only Leopard but also Windows 7 - this would be a terrible thing for marketing.

The 2.8GHz previous model is definitely the best value for money right now. The only thing the 2.23GHz Nehalem has that the 2.8GHz doesn't have is Hyperthreading but Hyperthreading's been the biggest pile of doggydoodoos since it was first shoved into Pentium 4s in 2002.

Do you know, a little known fact, Hyper Threading support has actually been in OS X since before official Intel machines were released? The Intel Developer Transition Kits, with the first Intel versions of Tiger on, were built upon P4 CPUs and they had Hyper Threading. Tiger reported four CPUs back then too.

nanofrog · May 27, 2009

Spanky Deluxe said:
Nah don't worry about that. Pretty much any optimisations that they can do will be backwards compatible with the previous chips anyway - i.e. better use of SSE(insertnumberhere) instructions. Besides which, if they did purposefully kill the code for pre-core environments then it would kill performance on their entire current line-up besides the Mac Pro. Also, it would be blazingly obvious that they did sabotage the code because suddenly SL machines would be running slower than not only Leopard but also Windows 7 - this would be a terrible thing for marketing.

The 2.8GHz previous model is definitely the best value for money right now. The only thing the 2.23GHz Nehalem has that the 2.8GHz doesn't have is Hyperthreading but Hyperthreading's been the biggest pile of doggydoodoos since it was first shoved into Pentium 4s in 2002.

Do you know, a little known fact, Hyper Threading support has actually been in OS X since before official Intel machines were released? The Intel Developer Transition Kits, with the first Intel versions of Tiger on, were built upon P4 CPUs and they had Hyper Threading. Tiger reported four CPUs back then too.

But what's been changed code wise since the crud known as the P4 was released?

It's been awhile, and they may have actually gotten it right this time around.

t0mat0 · May 27, 2009

How's about some place "your money where your mouth is" style speculation?

What apps would people want benchmarking to compare the 2 rigs (Octocore Harpertown standard configuration and Quadcore Nehalem) on 10.5.7 vs 10.6 ?

With some suggestions agreed on, it would be interesting to see those saying there won't be any changes/minimal and also those saying big improvements coming, and also a rough performance change % 10.5 vs 10.6 for both machines.

Spanky Deluxe · May 27, 2009

t0mat0 said:
How's about some place "your money where your mouth is" style speculation?

What apps would people want benchmarking to compare the 2 rigs (Octocore Harpertown standard configuration and Quadcore Nehalem) on 10.5.7 vs 10.6 ?

With some suggestions agreed on, it would be interesting to see those saying there won't be any changes/minimal and also those saying big improvements coming, and also a rough performance change % 10.5 vs 10.6 for both machines.

Things like CS4 benchmarks, Pro Tools stuff, MP3/MP4 encoding, Cinebench and Geekbench for the hell of it, anything else anyone can think of that reflects the kind of usage Mac Pro users do.

I'm sure this kind of stuff will become clear when Snow Leopard comes out. We may well know the answers to some of this stuff if Snow Leopard goes gold master for WWDC and developers get their final copies - of course that's only if Snow Leopard is already done.

My predictions are that Snow Leopard will perform within 5% of Leopard in initial benchmarks. That's for both Nehalem and Core 2 Duo processors.

akdj · May 27, 2009

Tesselator,

Very interesting reading your information on MT/HT technology. I do remember almost 7 years ago, getting my Pentium PC with HT...pretty proud they were, I think I paid a 10-15% premium at the time.

I am curious though, with Snow Leopard and Windows 7 on the way, will software developers be able to code their games/applications to take advantage of BOTH the main CPU's and the GPU's? I have read some on this recently and it seems like a great idea, especially when the graphics system isn't being taxed and the CPUs are. Just curious, using your analogy as far as locomotion, it seems irrelevant where the extra cores or processors are, the architecture needs to change to take advantage.

I must admit, I was (I guess I still am, with my experience with HT 7 years ago) optimistic as well with Snow Leopard on the horizon, to take advantage of these multi core machines. Makes me less concerned about missing anything with the last generation 3.0gx8. All I do is video/audio/photo editing and graphic layout with our Mac (business). It is always a write off, but sense has to be made to spend more money. Faster rendering, less waiting, quicker and more efficient software always means more time to make more money

Is there a way to take advantage of these technologies or is it all HYPE? I know the integration of the caches to the CPU's and better/faster and more efficient memory (RAM and quicker seek times on the HD's) will improve as time moves forward, but are we capped with CPU (horse) power for the time being? Are these speed improvements only attributable to the memory systems, mother boards, faster hard drives, etc?

Is does seem like a snake oil pitch, the multi core, Hyper threading, virtual core sch-peel from the chip manufacturers, if what you say is true (and I have no reason to doubt you).

J

poobah · May 27, 2009

Tesselator said:
I think almost every point you made is very incorrect. There remains a tiny TINY bit of truth to each though - maybe.

You didn't really address my points (which I maintain are accurate), but I'll take a stab at the strawmen you replaced them with...

1) Scheduler technology is ancient. 20 years old or more. If Apple and the BSD coders have a scheduler that sucks as bad as you're saying then it's surprising that OS X even works at all.

Actually, its older than that. Age isn't the issue. Hardware has changed drastically over the years, and the scheduler algorithms have to catch up. Ideally, you'd schedule nehalem differently than a P4. Multiple cores/CPUs on a desktop machine are still a relatively new concept, and the software has to catch up, it's just life.

Physical cores (PCs) are not better than Virtual cores (VCs). 16 VCs will always be faster or the same as 8 PCs. Always! The only downside of VCs is code compatibility. And this is how it should be considered: 8 vs. 16. Comparing 8 VCs with 8 PCs is extremely silly. HT is an additional feature and not a replacement technology for real cores.

I said that 8 real cores would beat 4 real + 4 virtual. That will always be true all other things being equal. The HT cores share pieces with the "real" cores, there will *always* be contention that doesn't happen with distinct cores. Otherwise, I agree.

2) the services and BG tasks that run on your 8 core MP are already "intelligently distributed" and all of them together utilize less than 5% or 10% from the total 800%. The 2 to 5 percent Apple might be able to improve scheduling in 10.6 will probably not be noticeable and certainly not for applications not specifically written to take advantage of the new facilities.

The kernel didn't get thread load balancing until Leopard, it's still a 'new thing' for our Macs. It's not really the couple percent they use idling, it's the context swaps that happen when your app has to share with 400ish system threads. Carving them up in a more thought out manner will help tremendously. Building your system from the start with an eye toward multi-core operation is going to be more efficient than bolting it on later.

3) the apps that CAN BE multi-threaded already are multi-threaded. Most applications simply CAN NOT BE multi-threaded. Some code CAN BE multi-threaded but the benefits are not worth the time needed to recode the applications. This is part of the reason that some code flakes out and/or crashes on HT enabled processors too BTW. Until Intel comes up with a scheme where multiple cores share common L1 and L2 caches this will not change. Think of a simple analogue algorithm like locomotion. Your single brain needs to place one foot in front of the other. Having multiple brains working on the task of walking will not speed up walking as the foot behind is time dependent on the foot in front. One event needs to happen and the result examined before the next event can begin. So it is with the majority of applications we use today.

Some applications work well multi-threaded, others do not, no argument there. The point was that for appropriate apps, writing multi-threaded code is not orders of magnitude more difficult. There are plenty of single threaded apps still out there that could benefit greatly from a re-write with a multi-threaded approach. (and yes, some others never will)

4) The issues today are still the same issues we've always had. Clock Speed is the answer to one question, Multiple Cores is the answer to an entirely different question. Until the architecture of a CPU is fundamentally changed this will remain the same.

Different approaches, certainly, but there is quite a bit of overlap, You just have to change the way you attack the problem, and choose which ones will benefit most. True, some things can only go faster if the CPU is faster, but many things can go faster in parallel.

And yes, I'd much rather have a 20 GHz CPU than 8 x 2.5, but that's not likely to happen anytime soon, So I'll be happy with what I have

t0mat0 · May 28, 2009

9to5Mac's reporting that Final Cut Pro Studio 7 may allow realtime editing of 1080P H.264 video.

I'd call that >5% performance increase. Time to start factoring in the possible effect of GPU when comparing the Octo and the Quad when using 10.6 vs 10.5? Isn't a system wide change, but we might soon see some rough levels for specific speed enhancements where SL can shine.

Tesselator · May 28, 2009

300D said:
"Small" meaning "almost everything"?

No, he said "fully utilize". If that's even 75% then the percentage of apps is indeed very small. 3% ~ 5% if we radically exaggerate.

Tesselator · May 28, 2009

akdj said:
Tesselator,

Very interesting reading your information on MT/HT technology. I do remember almost 7 years ago, getting my Pentium PC with HT...pretty proud they were, I think I paid a 10-15% premium at the time.

I am curious though, with Snow Leopard and Windows 7 on the way, will software developers be able to code their games/applications to take advantage of BOTH the main CPU's and the GPU's?

Beats me. It looks like some will. I assume most won't. A lot depends on the framework and tools available I guess.

I have read some on this recently and it seems like a great idea, especially when the graphics system isn't being taxed and the CPUs are. Just curious, using your analogy as far as locomotion, it seems irrelevant where the extra cores or processors are, the architecture needs to change to take advantage.

Yeah, I was speaking in general truths. Another thing to consider is that as NEW apps are created their internal architecture can MUCH more easily be fitted to the new concepts - multiple computing resources. For example if PS was to be created today would the developers treat the API in the same way or would they choose a different internal structure where the internal processing pipeline was created with multi-recourse computing in mind. Currently it's not and if they change it what are the pros and cons? How many 3rd party developers will have to rewrite their products from the ground up? How much increased speed will actually be achieved? Etc. As it is right this minute the answers to those questions do not warrant a rewriting of PS. The same is probably true for many applications. With OpenCL and "better schedulers" (whatever that might mean) this may change but I remain a skeptic for the most part.

I must admit, I was (I guess I still am, with my experience with HT 7 years ago) optimistic as well with Snow Leopard on the horizon, to take advantage of these multi core machines. Makes me less concerned about missing anything with the last generation 3.0gx8. All I do is video/audio/photo editing and graphic layout with our Mac (business). It is always a write off, but sense has to be made to spend more money. Faster rendering, less waiting, quicker and more efficient software always means more time to make more money

Is there a way to take advantage of these technologies or is it all HYPE?

Wait and see is the only reasonable answer to that. It's certainly not all hype. But usually the hype is much greater than the reality. This is the computing industry. They need to hype to generate expectations and excitement in order to sell their goods.

I know the integration of the caches to the CPU's and better/faster and more efficient memory (RAM and quicker seek times on the HD's) will improve as time moves forward, but are we capped with CPU (horse) power for the time being? Are these speed improvements only attributable to the memory systems, mother boards, faster hard drives, etc?

No I don't think we're "capped". And when one technology reaches it's limits another will be introduced. There are already several ready and waiting. But they will wait a bit longer.

Remember these are companies we're talking about and their goals are maximum yield from the least amount of effort - as it is with any publicly held company or corporation. Also is to consider that there are powers in the status quo that do NOT want equal playing fields. For example we (the US and UK) have long limited what technologies we will allow to be exported. The same powers do indeed limit what we're "allowed" to have and have access to. Some of the quote-unquote nutty speculations you hear about what the government has in secret are true. I saw the work that was done on the "StarWars" project in the early 60's and lat 50's. This wasn't introduced for public awareness until the mid-80's. some 25 years later. And top NASA and University scientists across the western world all said there was no such thing and we wouldn't be able to achieve it for many years to come. LOL They were saying these things all the while it already existed - I know 1st hand. But I do digress.

Is does seem like a snake oil pitch, the multi core, Hyper threading, virtual core sch-peel from the chip manufacturers, if what you say is true (and I have no reason to doubt you).

I dunno. Is it snake-oil if it's 5% true? About 10% or 20%? There's some truth in it for sure. How much is the question. Since this is exactly the same technology already introduced some (as you say) 7 years ago then we should already know pretty much what to expect. I don't believe it will be wildly different. Right? First we had single processors, then we had multiple processors, then we had HT both in single and multi-processor systems, then multi-core processors, and now multi-core with HT. At each point along the way the OS's and apps had to be retuned for the new architecture. We're on the edge of seeing what this round or retuning will be like. I say it won't be much different than the retuning we saw for the first round of HT - which was "better" but nothing astounding or ground-breaking.

poobah said:
You didn't really address my points (which I maintain are accurate), but I'll take a stab at the strawmen you replaced them with...

Huh? Your whole "point" was based on pure fiction unless you're beta-testing 10.6 and know something we don't - and even then it would still be partial fiction as the products and developments you're making preemptive claims about don't exist or haven't been released yet. So you're coming from pure speculation in the first place. That's cool - I like to speculate - but we can't claim our speculations are absolutely accurate. Call a spade a spade.

Actually, its older than that. Age isn't the issue. Hardware has changed drastically over the years, and the scheduler algorithms have to catch up. Ideally, you'd schedule nehalem differently than a P4. Multiple cores/CPUs on a desktop machine are still a relatively new concept, and the software has to catch up, it's just life.

Age is the issue in that maturity comes with age. Code and architecture maturity. Multi-processors have been around and very common for 25 years that I know of. Windows NT 3 had affinity settings which would allow up to 16 processors. I got my first dual in the EARLY 80's, multi-resource computing is MUCH older than that. It not relatively new at all unless you wish to compare mechanical computers from the 16th and 17th centuries. Multi-processor desktops have existed pretty much since there were desk-tops and their popularity increased right along side main-stream electronic computing.

I said that 8 real cores would beat 4 real + 4 virtual. That will always be true all other things being equal. The HT cores share pieces with the "real" cores, there will *always* be contention that doesn't happen with distinct cores. Otherwise, I agree.

Yeah, I know what you said. But comparing the same number of VCs with that number of PCs isn't much fun and really not fair. It's a feature of a processor and not a replacement for other processors. Sometimes it's an advantage, sometimes not, and sometimes it's a disadvantage - when we consider stability issues.

The kernel didn't get thread load balancing until Leopard, it's still a 'new thing' for our Macs. It's not really the couple percent they use idling, it's the context swaps that happen when your app has to share with 400ish system threads. Carving them up in a more thought out manner will help tremendously. Building your system from the start with an eye toward multi-core operation is going to be more efficient than bolting it on later.

I'm not sure what you're talking about but Mac OS has had scheduling and migration with dynamic task assignment since Mac OS had Multi-tasking. You really can't have one without the other. And those are "thread load balancing" so you've lost me.

Some applications work well multi-threaded, others do not, no argument there. The point was that for appropriate apps, writing multi-threaded code is not orders of magnitude more difficult. There are plenty of single threaded apps still out there that could benefit greatly from a re-write with a multi-threaded approach. (and yes, some others never will)

No one said it was always too difficult or even always very difficult. I brought up the fact that it's often not worth it as the code will either suffer poorer execution speeds or not enough speed increase will be realized to justify the effort. It's simple matter of company cost to profit analysis. And of course the fact remains that most of the applications we use today simply cannot be multi-threaded in any significant way. They need the result from operation one in order to calculate operation two. Simple as that.

I'd much rather have a 20 GHz CPU than 8 x 2.5, but that's not likely to happen anytime soon, So I'll be happy with what I have

Yes, until we can do for ourselves we have to be satisfied with (or at least accept) what we're handed - or do without.

t0mat0 said:
Isn't that graph a bit off though?

It can be correctly assumed to be "off" in either direction depending on the application base you're testing. For example if you're only testing rendering engines then it scales almost linearly. Eight processors or physical cores will be 750% ~ 800% faster than a single core/processor. In the opposite direction if you test apps that cannot be or still are not multi-threaded at all then the graph will look almost completely flat where 8 cores are relatively the same speed as a single core. How they came by their numbers I dunno but I can see some truth in it.

poobah · May 28, 2009

Tesselator said:
Huh? Your whole "point" was based on pure fiction unless you're beta-testing 10.6 and know something we don't - and even then it would still be partial fiction as the products and developments you're making preemptive claims about don't exist or haven't been released yet. So you're coming from pure speculation in the first place. That's cool - I like to speculate - but we can't claim our speculations are absolutely accurate. Call a spade a spade.

Yes, speculation, but based on the published information regarding SL and Leopard's respective schedulers. My sense is that there will be a considerable performance bump (say 5-10%) on the same code, and that's before we account for 64bit goodness. And hey, isn't this arm-chair quarterbacking fun?

Age is the issue in that maturity comes with age. Code and architecture maturity. Multi-processors have been around and very common for 25 years that I know of. Windows NT 3 had affinity settings which would allow up to 16 processors. I got my first dual in the EARLY 80's, multi-resource computing is MUCH older than that. It not relatively new at all unless you wish to compare mechanical computers from the 16th and 17th centuries. Multi-processor desktops have existed pretty much since there were desk-tops and their popularity increased right along side main-stream electronic computing.

Today's multi-cpu architectures are vastly different than even 5 or 10 years ago. An Intel Paragon, a dual-P5, and a dual QuadCore Nehalem are Totally different animals. New scheduler stuff is constantly being researched to leverage the new architectures and capabilities of modern multi-core CPUs. Maturity != efficiency. Seriously, the car is what, over 100 years old, and we still power them with liquified reptile remains.

Yeah, I know what you said. But comparing the same number of VCs with that number of PCs isn't much fun and really not fair.

It *is* the thread topic

I'm not sure what you're talking about but Mac OS has had scheduling and migration with dynamic task assignment since Mac OS had Multi-tasking. You really can't have one without the other. And those are "thread load balancing" so you've lost me.

Prior to Leopard, the scheduler didn't diverge much from the CMU Mach scheduler (that goes back to the mid-90's). Leopard included many updates, most notably better load balancing and a primitive cpu affinity mechanism. The lack of a (good) affinity system can really hurt performance (cache thrashing, etc). Additionally, the scheduler has no provisions for asymmetric core capabilities. The prevailing wisdom is that SL will utilize some of the concepts from the FreeBSD ULE scheduler (excellent paper on it here). The ULE scheduler seems likely, given the advent of OpenCL, and the desire to run some tasks on a GPU core in SL... That gives us at least 3 different core types (Real CPU, HT and GPU) and capability sets.

No one said it was always too difficult or even always very difficult. I brought up the fact that it's often not worth it as the code will either suffer poorer execution speeds or not enough speed increase will be realized to justify the effort. It's simple matter of company cost to profit analysis. And of course the fact remains that most of the applications we use today simply cannot be multi-threaded in any significant way. They need the result from operation one in order to calculate operation two. Simple as that.

IMHO, you are thinking too large. Most apps are linear in that they are waiting for user input, but once the user instructs the app to do something, multi-tasking can shine (consider the embarrassingly parallel photoshop filter task). At any rate, my only claim is that multi-threaded programming isn't really all that hard if you can wrap your brain around it. I personally enjoy it.

Tesselator · May 28, 2009

poobah said:
And hey, isn't this arm-chair quarterbacking fun?

Yes.

It *is* the thread topic

I didn't think so. I thought it was "(octo-harpertown Vs quad-nehalem) + 10.6 = ???" and "Can anyone help me figure out grand central? Is there even enough info on GC yet?" Nothing specifically about 8 VCs vs. 8 PCs.

Prior to Leopard, the scheduler didn't diverge much from the CMU Mach scheduler (that goes back to the mid-90's). Leopard included many updates, most notably better load balancing and a primitive cpu affinity mechanism. The lack of a (good) affinity system can really hurt performance (cache thrashing, etc).

Affinity is a super simple mechanism and does not curb nor enhance the persistent load-balancing problem. Affinity becomes slightly more complicated with multiple hyper-threaded multi-core processors due to the complete affinity that exists between the VCs of the same core and the partial affinity that exists between the physical cores of the same processor chip. Any affect Affinity has is only due to cache repopulation and we're probably talking about such minute performance differences that it would be nearly impossible to detect for the average user running bench marking programs. NetBSD has the very best scheduling and affinity of any popular OS AFAIK and Rhapsody (Ah-hem, OS X) is based on NetBSD AFAIK so unless Apple threw out excellent code and replaced it with juvenile crap OS X is already very very good and following the NetBSD developments would provide Apple with any new schedular features etc..

Multi-tasking in OS X (ever since they went with Intel chips and NetBSD preemptive multitasking) is excellent although classic applications running cooperatively multitasked (under Mac OS 9 running as OS X process) can be pretty crappy.

Additionally, the scheduler has no provisions for asymmetric core capabilities. The prevailing wisdom is that SL will utilize some of the concepts from the FreeBSD ULE scheduler (excellent paper on it here). The ULE scheduler seems likely, given the advent of OpenCL, and the desire to run some tasks on a GPU core in SL... That gives us at least 3 different core types (Real CPU, HT and GPU) and capability sets.

Yeah, I didn't think of GPU. But you're saying that Apple did indeed throw out good code for crap or something? Or are you saying that ULE like "evolutionary next step" code will be added in turn as Apple has been doing all along?

Most apps are linear in that they are waiting for user input, but once the user instructs the app to do something, multi-tasking can shine (consider the embarrassingly parallel photoshop filter task). At any rate, my only claim is that multi-threaded programming isn't really all that hard if you can wrap your brain around it. I personally enjoy it.

I agree with that. But rewriting any code has an inherent implied cost. If the code or app in question needs to architecturally or fundamentally change then the cost would be very high. The results may not be worth the effort if the operations do not scale well. Anyone can answer this for themselves: Spending a million or so in development costs in order to squeeze 2% or even 5% speed increase? Especially if the app is already relatively in-line (speed-wise) with other applications and customer satisfaction is relatively high? Would you do it? I think the answer is obvious myself and so I assume many developers will not be rushing in to rewrite their applications. If on the other hand, the application scales phenomenally well then chances are that it already is designed with multi-threading in mind. All in all, not much will change. At least that's what I'm thinking.

PS: It actually sounds to me besides a few minor points, that we are in agreement. About the only real difference is that you're saying 5% ~ 10% and I'm saying 0% ~ 5% as an average performance increase.

poobah · May 29, 2009

Tesselator said:
Yeah, I didn't think of GPU. But you're saying that Apple did indeed throw out good code for crap or something? Or are you saying that ULE like "evolutionary next step" code will be added in turn as Apple has been doing all along?

Pretty much. I see the ULE stuff as a bigger step, but certainly an evolution, not a replacement.

On affinity, the pre-leopard scheduler wasn't smart about grouping related threads on the same core to avoid thrashing the caches. None of this stuff is more than a (few)percent performance here or there, but in aggregate it can be significant.

PS: It actually sounds to me besides a few minor points, that we are in agreement. About the only real difference is that you're saying 5% ~ 10% and I'm saying 0% ~ 5% as an average performance increase.

You know how the intar-web works... two people slug it out and after a couple weeks realize they have the exact same viewpoint

5-10% ain't bad for a software rev. If the 64bit stuff is done well, we'll probably get a bigger bump from all 64bit-ness than the 64/32bit mix we have now. That, and if they do have to re-write pieces to get to 64 bits, its a good time to examine the code multi-threaded goodness

t0mat0 · Jun 9, 2009

Any takers on the discussion now? Seems Apple has put a decent amount of code refactoring to get OpenCL, GCD implemented into core apps, and improvements here there and everywhere. Anyone seen any benchmarks or impressions from the new Developer Preview?

netnothing · Jun 9, 2009

t0mat0 said:
Any takers on the discussion now? Seems Apple has put a decent amount of code refactoring to get OpenCL, GCD implemented into core apps, and improvements here there and everywhere. Anyone seen any benchmarks or impressions from the new Developer Preview?

I'd be interested in any new developments now that devs have a near-final version of Snow Leopard as well.

Also, can anyone comment on how Snow Leopard will perform in general with lower spec'd systems?

For example, I have a 2006 Mac Pro 2.66 with the 2 Dual cores and the 7300GT. I know the video card is not supported for OpenCL (neither is the x3100 in my 2008 Macbook

), but I'm guessing that overall Snow Leopard will run at the same level or slightly better than Leopard does today on these machines?

Even without OpenCL suppport, both these machines have multiple processors (or cores), so I'm guessing that once apps are re-written to use GCD, even these slightly older machines will see some benefits?

-Kevin

(octo-harpertown Vs quad-nehalem) + 10.6 = ???

macrumors 601

macrumors regular

macrumors member

macrumors member

macrumors 601

macrumors 603

macrumors 65816

macrumors 603

macrumors 603

macrumors G4

macrumors member

macrumors 603

macrumors G4

macrumors 603

macrumors 603

macrumors 65816

macrumors member

macrumors 603

macrumors 601

macrumors 601

macrumors member

macrumors 601

macrumors member

macrumors 603

macrumors 68040

Our Staff