Grand Central Dispatch and Open CL Bring Significant Performance Improvements for Optimized Applications

MattInOz · Sep 16, 2009

LAS.mac said:
Open GL/Grand Central optimization is fine, however such improvement are probably an advantage of, what, 20% of the users?
I've been holding on SL upgrade, so far. I've been actually reading of more complains than positive comments.
I'd much prefer to see improved overall conditions, increased battery life for laptops, etc.

Grand Central optimization and OpenCL is what allows the next round of Battery improvements to happen. Do more with less.

OpenCL allows use of wasted cycles in the GPU that your already paying the power budget for.

What is the Rule of Thumb the same core at half the speed will use a 1/4 of the Power. So doubling the number of cores but keeping the same number of cycles give you the same amount of work for half the power. Double again for half the power again. Better still if only one core is active you can power down the others. Multi cores are just better at using the right amount of power for the job at hand.

This is where you rely on GCD or Threading, the better a program, even something small, scales across this sort of environments the more room Apple has to power scale and save your battery.

Problem is subjectively you'll measure the performance of the whole machine based on which of your Applications has the highest demand for a single core.

Falcor Derkins · Sep 16, 2009

You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.

GCD = multithreading which most applications can take advantage of if they aren't already threaded so long as there are operations that can be run in parallel.
OpenCL will only speed up data intensive tasks of a specific nature as you mentioned, but these are generally all of the programs that need to run faster anyway. who needs to speed up word? but faster itunes encoding, faster imovie rendering, etc. would be nice.

"If he got almost a 50% increase with a quad core machine i expect crazy numbers from an 8-core variety"

the tests compare running on the GPU vs. the CPU, unless the leopard implementation wasn't threaded. in fact his numbers would be far less impressive given a faster 8-core machine as that's what he's comparing against. If that doubles the speed then the GPU implementation may be slower (of course it's not as linear as that).
Actually the numbers aren't that impressive, perhaps because the now ancient GPU they used. see http://www.anandtech.com/video/showdoc.aspx?i=3339&p=1 or other CUDA (equivalent to OpenCL and has been around for a few years) based benchmarks. Some tasks have received 100x speedup over a CPU implementation. This isn't really new technology, but it's exciting to see more mainstream applications taking advantage of GPGPUs.

"Wrong. Any application [including all of iWorks] can leverage OpenCL for it's offloading of number crunching, aiding Quartz in various aspects to streamlining processes for WebKit and thus give everyone an improved experience. Built-in SVG, WebGL for Opera, Firefox, Safari, Chrome, etc., will benefit on OS X.
The impact for an application like Keynote will be more visible the more Keynote expands into OpenGL presentations that include interactive fly-throughs and much more."

Keep in mind that running any operation through OpenCL, ignoring that fact that it can run on the CPU as well, requires to transfer data to the GPU. The overhead isn't worth small operations and the GPUs architecture is very different from the CPU. Tasks that require a lot of branching or sequential operations (a lot of tasks you normally do) don't perform well on the GPU. Given that people typically have pretty much unused CPU cores now anyway, it makes no sense to transfer everything to the GPU. It isn't an automatic make everything faster device. And while useful for many games, the GPU may already be the bottleneck just for rendering it won't always make sense to offload AI, etc. in those cases. Nowadays it seems like most developers don't even try to push the hardware anymore anyways, everything is made to run on consoles first which are 4 years old. I can't find games to really tax my quad core i7 and g260. no joke. and that's running things like far cry 2 at 1080p with maxxed settings.

"Until recently the developer API for OpenCL wasn't even available which is why you won't see it in many apps. On the GC side, now that Apple has open sourced it as libdispatch, it's entirely possible more OSes and developers will start using that. But it may take years for that to standardize and happen."

the API has been available for a while to developers, just not the public. Before that we had CUDA, which is almost a direct translation to OpenCL. There are already implementations of OpenCL on linux and windows. That's the whole idea of it being open, it's not exclusive to apple like directx11. I think the biggest problem right now is just that there are a very small pool of developers that know how to use this, and most of them are probably in graduate school like me.

The biggest gain will probably come to final cut pro, motion, etc. It's ridiculous that people wait hours for video to render in these applications.

AidenShaw · Sep 16, 2009

MattInOz said:
OpenCL allows use of wasted cycles in the GPU that your already paying the power budget for.

Nonsense - like the CPU, the GPU uses power according to the load put on it. If you're not doing heavy 3D or GPGPU work - there's no power drain.

And the converse is true - once you harness those GPU computing units, the power drain and heat production shoot up, and battery life plummets.

There's no free lunch....

MattInOz · Sep 16, 2009

AidenShaw said:
Nonsense - like the CPU, the GPU uses power according to the load put on it. If you're not doing heavy 3D or GPGPU work - there's no power drain.

And the converse is true - once you harness those GPU computing units, the power drain and heat production shoot up, and battery life plummets.

There's no free lunch....

Sure but it isn't a smooth power curve is it?
The power increases in steps.
So like the gears in your car each step has a sweet spot of efficiency. The smoother you drive the less fuel you use.

Chimpy · Sep 16, 2009

I'm drooling in anticipation of what 2010 will bring in speed increases for me and my 2006 Mac Pro...should be interesting.

Amdahl · Sep 16, 2009

TPALTony said:
Now you have the same number of lines of code, a slightly different syntax, but your 2 second task is being done in parallel, with GCD deciding how many can be run at a time based on what your overall system is doing.

This seems a lot like OpenMP for Objective-C. Which is a good thing.

But the other commenters are right; GCD is not the big gain the Steve Jobs Reality Distortion Field wants you to think it is. It is only an incremental improvement. The big gains were made years ago when developers implemented multi-threading. (If they did.)

I think something important is being left out of these Fantastical Exampicals. In the real world, you need to know when your little parallel block task is done. That means, you are waiting for some notification (very thread like), or you are stopping your execution until the entire GCD 'block' is done. Because of this, the GCD gains will be few and far between. And will probably still manage to bring in bugs for developers who actually believe that GCD makes things 'stupid easy.'

I would bet most of the gain in the example app in this article is from OpenCL, which actually brings new computing resources, rather than GCD, which gives you the same old crap in a new box.

wizard said:
Now this isn't to say Apeture or anything else has been optimized for SL. Just that SL giveS multi threaded apps more respectable behavior.

This is probably from improvements in the kernel. OSX has traditionally had crappy thread performance, and perhaps that is now fixed.

JesterJJZ · Sep 16, 2009

SimonTheSoundMa said:
There, corrected for you. Any pro worth their nut will use Avid.

That's very not true...FCP is used to edit many major network television shows. Avid used to be top dog but FCP has stolen much of Avid's thunder over the years. Do some research before speaking next time.

pmjoe · Sep 16, 2009

2002cbr600f4i said:
Yes, it is easy (and yes, I am a Java developer, and I know it's easy to create threads.) It's NOT easy, however, to manage them effectively in a large Swing application... Hence why the SwingWorker system came out... When I read about Blocks and GCD, I immediately thought "This is they're version of SwingWorker!" and in a lot of ways it IS.

And as I said (and many others have as well) not all programs processing jobs can be broken up into multiple threads in order to run faster, but by using threads to handle the whole "go do this when I press this button, and do it off the UI thread so the UI can remain responsive" the whole system is snappier and better able to respond and adjust resources as needed.

SwingWorker is an absurdly trivial piece of code. It is convenient, but hardly a miracle. Blocks are actually somewhat more interesting, but relating SwingWorker to GCD is a bit of a stretch.

http://java.sun.com/products/jfc/tsc/articles/threads/src/SwingWorker.java

Note that Java or C++ or any of the other language specific threading systems don't really do anything to keep you from creating a crap load of threads that aren't really doing anything. Threads might not take CPU time if they're idle, but they DO take up memory, and they DO take up space in the thread management/task scheduler.

Can you really admit to being a developer and say this in the same discussion?

You're misquoting me... It makes it STUPID EASY to dispatch processing work off to threads (like what I said with the processing work invoked by the button press.) It does NOT make it "stupid easy" to redesign your program to do concurrent data processing... That still takes work. But blocks and GCD remove some of the overhead boilerplate type work that developers usually have to do to create and manage threads on their own.

Creating threads and starting them was never the difficult part in the first place. Outside of thread pools, I don't see much "management" going on here. All blocks do is make the easy part even easier.

The point is to lower the barriers to using these tools so more developers will put in the effort to use them, rather than think "This is too much work, screw it, it's not like this chunk of code will take more than a second to do." (forgetting that that chunk of code might need to wait on a network connection or diskIO or something like that which could get blocked or stalled, making the whole app hang...)

But going back to the GUI button example here, you never give a good reason. The thread goes off and does its thing, the user goes off and does theirs. The thread returns and the user has now gotten the app into a different state than when the thread started ... WTF is the programmer who didn't know how to create a thread before blocks existed going to do??? He's screwed. In the meantime, the user has gone off and created a couple more threads that may or may not have finished before the first one did.

This programmer is never going to get that poor user's data back into a consistent state. But thank God his user's GUI is responsive!

Next time, try making the hard part easier.

freiheit · Sep 16, 2009

I want to see a generic GCD benchmark tool using nested FOR NEXT loops

What I would like to see, as a raw Grand Central Dispatch benchmark which any user could run, is a simple nested loop test like the one mentioned in the extensive Snow Leopard review on ArsTechnica.

Example:

Outer loop contains a list of movie titles, some in ALL CAPS, some all lowercase, some MiXEd CAse. Test run without GCD loops through this list one at a time and launches a second loop to make each word within the current title lowercase with initial capital, thereby rendering the list, one word at a time, as Title Case.

Next step is the same run through but using GCD to launch as many concurrent loops as possible on the user's hardware. Since each movie title is unique and does not rely on any previous title being converted, there's no problem with doing them out of order.

Final result, show the actual time to convert the list each way and the % of speedup by using Grand Central Dispatch. This could then be run on any other Intel Mac to find how much potential improvement there is in more cores and higher clock speeds.

AidenShaw · Sep 16, 2009

MattInOz said:
Sure but it isn't a smooth power curve is it?
The power increases in steps.
So like the gears in your car each step has a sweet spot of efficiency. The smoother you drive the less fuel you use.

No, it's like "if the engine is off, it doesn't burn gas".

The major effort on power management in the last decade has been to disable and power down units that aren't being used - even for tiny fractions of a second.

Those GPUs burn a lot of watts when they are busy. One shouldn't assume that OpenCL will help battery life. The job may finish faster, but the watts per minute consumed may be higher.

Talez · Sep 16, 2009

Falcor Derkins said:
GCD = multithreading which most applications can take advantage of if they aren't already threaded so long as there are operations that can be run in parallel.

That's really underselling GCD.

The whole point is that you can setup a potentially lengthy (yet very tiny code wise) operation away from the main thread without having to have learn how to properly multithread an application. GCD lets you fire off a simple operation to run in parallel with the main thread (and not block it during a lengthy operation) using, effectively, two lines of code.

The automatic thread pool management stuff is just gravy.

wizard · Sep 16, 2009

No from what I've seen it goes deeper than that.

2002cbr600f4i said:
In reality, the speed ups you're experiencing are probably only because parts of the OS code have been rewritten to use GCD, and thus respond faster than they used to.

I have to disagree as I believe Apple refactored NSOperation and related stuff to run on top of the new threading architecture. It is the only viable explanation I have for some of the speed ups seen in older code. We maybe saying the samething but I'm specificaly saying that the infrastructure in place in SL to support GCD has had an excellent impact on existing software. Well existing software that took advantage of Apples NS threading primitives.

IE: the underlying system libraries might be better/more threaded than before. But Apeture itself hasn't been re-written/modified to use it itself. Once it is, you'll probably see even more performance improvements.

Well yah a rewrite can always speed things up, I'm just saying there has been a positive impact on existing software. That has a lot to do with the infrastructure put in place for GCD.

Dave

twoodcc · Sep 16, 2009

i guess it'll take awhile before a lot of apps are taking advantage of it. but glad someone is already

MorphingDragon · Sep 16, 2009

AidenShaw said:
No, it's like "if the engine is off, it doesn't burn gas".

The major effort on power management in the last decade has been to disable and power down units that aren't being used - even for tiny fractions of a second.

Those GPUs burn a lot of watts when they are busy. One shouldn't assume that OpenCL will help battery life. The job may finish faster, but the watts per minute consumed may be higher.

Sometimes Aiden I wonder why you bother staying here. I'm sure your computer views would be better spend on a general tech site. Its like shouting to the death here.

mdriftmeyer · Sep 16, 2009

Amdahl said:
This seems a lot like OpenMP for Objective-C. Which is a good thing.

But the other commenters are right; GCD is not the big gain the Steve Jobs Reality Distortion Field wants you to think it is. It is only an incremental improvement. The big gains were made years ago when developers implemented multi-threading. (If they did.)

I think something important is being left out of these Fantastical Exampicals. In the real world, you need to know when your little parallel block task is done. That means, you are waiting for some notification (very thread like), or you are stopping your execution until the entire GCD 'block' is done. Because of this, the GCD gains will be few and far between. And will probably still manage to bring in bugs for developers who actually believe that GCD makes things 'stupid easy.'

I would bet most of the gain in the example app in this article is from OpenCL, which actually brings new computing resources, rather than GCD, which gives you the same old crap in a new box.

This is probably from improvements in the kernel. OSX has traditionally had crappy thread performance, and perhaps that is now fixed.

Wrong. Block management is done at the System-level and you don't have to worry about it.

mdriftmeyer · Sep 16, 2009

pmjoe said:
That has nothing to do with the GUI, and I can do a progress bar without threads. To be useful, your whole application model would have to be smart enough to expect work to be done in a background thread and handle the result of that work when it completes. Otherwise, using threads is pointless. That is not as trivial as you just described.

Yawn! Threads are stupid easy to create in Java and so are thread pools. GCD cannot make developers more intelligent about when to use threads or how many to construct. Last I checked, I didn't think threads are that difficult in C or C++ either. Blocks constructs may help slightly with avoiding some concurrency issues, but what I've seen so far hasn't been that exciting.

No thought required to threads? THAT'S REALLY SCARY.

OS X must've had a really crappy kernel until last month.

It's nothing like Java's thread pooling and the blocks are actually C blocks.

http://clang.llvm.org/docs/BlockImplementation.txt

eastcoastsurfer · Sep 16, 2009

TPALTony said:
I believe that is the point. Simple things that CAN benefit from threads are frequently NOT threaded because you have to do quite a lot of thread management as soon as you start messing with them.

For instance, the simple act of doing "the same thing", let's say something that takes 2 seconds on each item in a collection, typically gets done along the lines of...

for (int i = 0 ; i < [collection count] ; i++) {
[[collection itemAtIndex:i] doTwoSecondTask];
}

Most developers would leave it at that, but with blocks and GCD you can literally rewrite that as...

queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply(count, queue, ^(size_t i) {
[[collection itemAtIndex:i] doTwoSecondTask];
}); // I didn't test this, I'm cribbing from the ADC documentation.

Now you have the same number of lines of code, a slightly different syntax, but your 2 second task is being done in parallel, with GCD deciding how many can be run at a time based on what your overall system is doing.

Remember, GCDs job is NOT to make everything run ultra parallel. It's job is to the make the overall SYSTEM continue to be responsive by being aware HOLISTICALLY of what is going on. Even if you thread pool yourself in your own app, which is somewhat time consuming to plumb in, you are only ever aware of what YOUR APP is doing. GCD sees the bigger picture...

So while it might speed some things up, I think it's big win is that it makes it EASY to thread things that otherwise would not be threaded without writing lots of thread pool management code. OK so you could use NSOperation, but that's not as holistic as GCD. There are lots and lots of "quick wins" you get with GCD, and that's what makes it applicable across the board.

Just my 2c.

The problem is that there is a lot more that you have to assume for your example to work. The big one is that the task you need to perform on each item is completely independent of all the other tasks that need to be performed on the other items (often the case in encoding/decoding, applying image filters etc..., not so much in GUI apps). Do any of the later tasks rely on the completion of the earlier tasks? What about shared resources? Do the tasks require access such a resource? Now you're dealing with possible deadlocks, race conditions, etc... Read about side effects and how they are what makes parallel programming so hard. Learn about functional languages like Erlang and why they are great for writing parallel code (hint, next to no side effects).

Trivial examples are simple to thread. Stuff like take a list of numbers and add 1 to them sounds great, but that's not what's going on in the typical program.

nkawtg72 · Sep 16, 2009

thoroughly confused...

okay, can someone please explain to me, how at the time i wrote this post, there are 97 Positive and 13 Negative opinions for this thread?

i guess i just cant understand the logic of some people and why they would find anything about this news Negative? what could someone possibly find thats negative about performance increases as a result of new technology implementations?

i'm willing to keep an open mind here, i'm not trying to be combative. i'm just trying to understand what angle these Negative opinions are coming from.

have a good day!!!

eastcoastsurfer · Sep 16, 2009

Talez said:
The whole point is that you can setup a potentially lengthy (yet very tiny code wise) operation away from the main thread without having to have learn how to properly multithread an application.

All GCD has done is make it easier to start a thread (not that it was very hard before) and given a 'global' pool of thread execution units that managed by the system (this is the real win). I don't see where it has removed the need for developers to learn how properly manage a multi-threaded application. Multi-threaded applications require the programmer to think about and manage the order of operations and the resources they may use. Apple has changed the syntax to locks and semaphores, but the logical process used by the programmer is still the same. They can't simply dump all their long running code into separate threads without understanding and planning for all the interdependencies between various threads.

At the end of the day that is what makes multi-threaded programming hard, not the syntax of creating a thread or locking a resource (in Java and C# it is already dead simple to create threads).

wizard · Sep 16, 2009

I'm more optimistic than this.

Amdahl said:
This seems a lot like OpenMP for Objective-C. Which is a good thing.

I prefer to see it as something new and different but yes the similarities are there.

But the other commenters are right; GCD is not the big gain the Steve Jobs Reality Distortion Field wants you to think it is. It is only an incremental improvement. The big gains were made years ago when developers implemented multi-threading. (If they did.)

This I simply disagree with, GCD has the potential to be huge. Especially with OpenCL along. Also I don't subscribe to the idea that everything has aready been multi threaded or has threading that can't be improved. In fact I can see a whole new generation of software coming that this tech enables.

I think something important is being left out of these Fantastical Exampicals. In the real world, you need to know when your little parallel block task is done. That means, you are waiting for some notification (very thread like), or you are stopping your execution until the entire GCD 'block' is done. Because of this, the GCD gains will be few and far between.

I'd suggest reading Apples documentation. The benifits are based on what the developer can exploit out of the algorithms being used. There is likely to be a lot of exist code that will see little gain but more importantly a lot of code that couldn't run on a desktop PC before will be possible.

The important thing is to look to the future not the past.

And will probably still manage to bring in bugs for developers who actually believe that GCD makes things 'stupid easy.'

Bugs are bugs and so are stupid programmers but that is not what we are concerned with here. What I'm excited about is this tech enabling a whole generation of software from the smarter minds out there. Maybe you are a democrate but frankly I don't really give a damn about the flunkies in this world. This tech is for a new generation of creaters.

I would bet most of the gain in the example app in this article is from OpenCL, which actually brings new computing resources, rather than GCD, which gives you the same old crap in a new box.

See the above about reading the documentation.

This is probably from improvements in the kernel. OSX has traditionally had crappy thread performance, and perhaps that is now fixed.

That is the whole point GCD and the new infrastructure to support it are making things snappy for old software. It appears at this time that NSOperation and it's allied calls now sit on top of GCD. I have found an explicit statement to that effect but it does explain why some software runs much better. SL truely represents a significant overhaul of Apples OS and frankly I think many misunderstand it's significance.

Dave

Eidorian · Sep 16, 2009

nkawtg72 said:
okay, can someone please explain to me, how at the time i wrote this post, there are 97 Positive and 13 Negative opinions for this thread?

i guess i just cant understand the logic of some people and why they would find anything about this news Negative? what could someone possibly find thats negative about performance increases as a result of new technology implementations?

i'm willing to keep an open mind here, i'm not trying to be combative. i'm just trying to understand what angle these Negative opinions are coming from.

have a good day!!!

This seems to be brought up more often lately for Page 1 articles. The votes don't matter.

Falcor Derkins · Sep 16, 2009

Talez said:
That's really underselling GCD.

The whole point is that you can setup a potentially lengthy (yet very tiny code wise) operation away from the main thread without having to have learn how to properly multithread an application. GCD lets you fire off a simple operation to run in parallel with the main thread (and not block it during a lengthy operation) using, effectively, two lines of code.

The automatic thread pool management stuff is just gravy.

that's the complete opposite of what I was entailing. what you describe is simplifying the implementation of multithreading (really you still need to know how it's working and basic synchronization ideas. Many operations will still require barriers/etc. to make sure that one thing has finished before something else can run, etc.). My point was that multithreading and hence GCD can benefit many applications as most GUI applications use threading significantly in response to someone saying it is useless for most applications, not to say that GCD is just a pthreads implementation or something along that line.

ikir · Sep 16, 2009

hassiman said:
My MacPro is one of those that can not boot to the 64 bit Kernal... but still runs 64 bit apps.. like LightRoom. Does this mean it can not take advantage of these advancements?

Forget the damn 64bit kernel. You can run 64bit app, and especially your machine is OpenCL capable.

ikir · Sep 16, 2009

nkawtg72 said:
okay, can someone please explain to me, how at the time i wrote this post, there are 97 Positive and 13 Negative opinions for this thread?

i guess i just cant understand the logic of some people and why they would find anything about this news Negative? what could someone possibly find thats negative about performance increases as a result of new technology implementations?

i'm willing to keep an open mind here, i'm not trying to be combative. i'm just trying to understand what angle these Negative opinions are coming from.

have a good day!!!

I can explain it to you: people are just stupid. Or also someone who don't like Macs and vote it down... indeed news votes are useless.

flooce · Sep 17, 2009

The tricky thing is that the new technology underling Snow Leopard is not going to make a difference straight away, but needs to be implemented by developers. Those who read the ArsTechnica review of Snow Leopard might believe the reviewer that Grand Central technology is relatively easy to implement into the code, because rather than creating threads on your own one can basically command to "handle things over to GCD" and it will do the rest of balancing the CPU-Power between processes and cores and all this tech things.

Well anyhow, it doesn't matter too much now, we will see benefits here over time though, be patient.

Grand Central Dispatch and Open CL Bring Significant Performance Improvements for Optimized Applications

macrumors 68030

macrumors newbie

macrumors P6

macrumors 68030

macrumors 6502

macrumors 65816

macrumors 68030

macrumors 6502

macrumors 6502a

macrumors P6

macrumors newbie

macrumors 68040

macrumors P6

macrumors 603

macrumors 68040

macrumors 68040

macrumors 6502a

macrumors 6502

macrumors 6502a

macrumors 68040

macrumors Penryn

macrumors newbie

macrumors 68020

macrumors 68020

macrumors member

Our Staff