View Full Version : Grand Central Dispatch and Open CL Bring Significant Performance Improvements for Optimized Applications
MacRumors
Sep 16, 2009, 04:31 PM
http://www.macrumors.com/images/macrumorsthreadlogo.gif (http://www.macrumors.com/2009/09/16/grand-central-dispatch-and-open-cl-bring-significant-performance-improvements-for-optimized-applications/)
Hardmac reports (http://www.hardmac.com/news/2009/09/16/positive-effects-of-grand-central-and-open-cl) on a performance comparison between Mac OS X Leopard and Mac OS X Snow Leopard from Christophe Ducommun, developer of MovieGate (http://web.mac.com/cducommun/MG_English/Home.html), a video encoding and DVD creation software package. Ducommun, who is optimizing his application to take advantage of the Grand Central Dispatch and Open CL features of Snow Leopard, has found remarkable performance improvements for his software on the operating system when running on a Mac Pro.Christophe Ducommun who keeps optimizing Snow Leopard for his application MovieGate just sent us results to illustrate how Snow Leopard can improve performance when one can make use of Grand Central and Open CL. Tests below have been performed with a Mac Pro 2007 (Quad Core 2.66 GHz with a GeForce 8800 GT).The results include an approximately 50% increase in video encoding speed when compared to Leopard, while also reducing the CPU load during video decoding by passing some of the work to the graphics processing unit.Snow Leopard
150 frame/s for encoding in MPEG-2
70% CPU load for decoding
130% CPU load for MPEG-2 encoding (ffmpeg)
Leopard
104 frame/s for encoding in MPEG-2
165% CPU load for decoding
100% CPU load for MPEG-2 encoding (ffmpeg)While Ducommun's experience is relatively rare at this point due to the inability for the vast majority of applications to make such comprehensive use of Grand Central Dispatch and Open CL at this time, it highlights the potential performance gains these core technologies can bring to Mac OS X as developers begin to take advantage of them.
Article Link: Grand Central Dispatch and Open CL Bring Significant Performance Improvements for Optimized Applications (http://www.macrumors.com/2009/09/16/grand-central-dispatch-and-open-cl-bring-significant-performance-improvements-for-optimized-applications/)
metrobot
Sep 16, 2009, 04:37 PM
I'd like to know which Apple apps are already optimized. For example, Logic 9?
Shake 'n' Bake
Sep 16, 2009, 04:39 PM
I'd like to know which Apple apps are already optimized. For example, Logic 9?
Probably none are.
I'd like a GCD/Open CL-capable iMove, Keynote, iDVD, Handbrake, iPhoto, FCS, and Logic Studio, along with Finder.
DUSTmurph
Sep 16, 2009, 04:42 PM
Yeah developers, start taking advantage of this ****.
NeverhadaPC
Sep 16, 2009, 04:42 PM
good news. :D
MacsRgr8
Sep 16, 2009, 04:42 PM
Probably none are.
I'd like a GCD/Open CL-capable iMove, Keynote, iDVD, Handbrake, iPhoto, FCS, and Logic Studio, along with Finder.
Indeed.
You know, I would have expected either FCS 3 to arrive after the launch of Snow Leopard, and boast about how fast these apps (or at least some actions) are using GCD and OpenCL, or have a few updates for these apps to make use of these optimizations.
Strangely, nothing can be found.
Probably iLife 10 etc..?
segers909
Sep 16, 2009, 04:45 PM
Since i updated to SL my cod 4 is running extremely slow.. (uMBP on 9600)
Eidorian
Sep 16, 2009, 04:49 PM
Decoding under OS X definitely needs a lot of help.
Shake 'n' Bake
Sep 16, 2009, 04:51 PM
Probably iLife 10 etc..?
Most likely. That's why I'm not getting iLife or iWork '09. '10 will be GCD/Open CL-compatible for sure and written in 64-bit Cocoa. If not, that'd be a pretty stupid move.
Chaszmyr
Sep 16, 2009, 04:55 PM
I desperately want GC/OpenCL optimized Handbrake, but the developers have been saying for a long time that it is probably nowhere near, and may never come at all. At first they said it was because of Handbrake being cross platform, and more recently they say that it's out of their hands and up to the people who make the encoder they use.
FSMBP
Sep 16, 2009, 04:55 PM
Most likely. That's why I'm not getting iLife or iWork '09. '10 will be GCD/Open CL-compatible for sure and written in 64-bit Cocoa. If not, that'd be a pretty stupid move.
I want that too, but I wouldn't bet on it - I mean, Logic 9 and FCS 3 was released a little over a month before Snow Leopard, and they weren't even optimized (Pro apps you would figure would be first in line for a rewrite).
Shake 'n' Bake
Sep 16, 2009, 04:58 PM
I want that too, but I wouldn't bet on it - I mean, Logic 9 and FCS 3 was released a little over a month before Snow Leopard, and they weren't even optimized (Pro apps you would figure would be first in line for a rewrite).
Not a surprise to me. Wouldn't you need SL to be absolutely sure the app would work? If so, then an new suite or a 3.X.0 update may be coming.
mdriftmeyer
Sep 16, 2009, 04:59 PM
Perhaps he could take the time to work on ffmpeg and tune it to use GCD and OpenCL?
He'd see far more improvements at that level which would benefit the entire chain of execution for his product as well, not to mention we'd all benefit from an optimized ffmpeg.
MorphingDragon
Sep 16, 2009, 05:07 PM
Now, wheres the libraries for us to take advantage of?
I thought we were promised an open standard. Why not any IBM Clone love? :rolleyes:
Doctor Q
Sep 16, 2009, 05:09 PM
How much developer work is required to optimize an application? Even with the proven benefits, I'm curious what percentage of developers will take advance of these technologies.
SpinThis!
Sep 16, 2009, 05:10 PM
I desperately want GC/OpenCL optimized Handbrake, but the developers have been saying for a long time that it is probably nowhere near, and may never come at all. At first they said it was because of Handbrake being cross platform, and more recently they say that it's out of their hands and up to the people who make the encoder they use.
Handbrake relies on other open source projects to do the en/decoding, such as x264 and ffmpeg. So unless you want to optimize and port a Mac version with OpenCL and GrandCentral yourself don't hold your breath.
Until recently the developer API for OpenCL wasn't even available which is why you won't see it in many apps. On the GC side, now that Apple has open sourced it as libdispatch, it's entirely possible more OSes and developers will start using that. But it may take years for that to standardize and happen.
I'd imagine it may only be a matter of time before Apple updates its own Compressor to take better advantage.
bob_the_gorilla
Sep 16, 2009, 05:16 PM
I desperately want GC/OpenCL optimized Handbrake, but the developers have been saying for a long time that it is probably nowhere near, and may never come at all.
To be honest, GCD isn't a big deal for Handbrake, at least on average machines. It already makes pretty canny decisions about threads.
OpenCL would bring serious gains, but we're looking to x264 for that – and they're not OS X-centric. If and when OpenCL spreads cross-platform (and GPU makers do seem keen on including it in their offerings) they'll almost certainly take more interest.
borcanm
Sep 16, 2009, 05:17 PM
Very good news. Now if only all the apps for macs would take advantage of GCD & OCL.
Michael73
Sep 16, 2009, 05:19 PM
Heh! The guy has (almost) the same rig is me (see my sig) but I have 2x his cores :D I've been waiting for this type of confirmation ever since GSD and OpenCL technologies were announced. If he got almost a 50% increase with a quad core machine i expect crazy numbers from an 8-core variety. Maybe ~200fps encoding rate.
I like handbrake but I'm not married to it. If there was another application out there (even paid, but reasonably priced) which could nab me 50-100%% gains I'd be all over it.
Eidorian
Sep 16, 2009, 05:22 PM
Handbrake is nice but the majority of Apple's machines sadly are dual cores. What I'd like to know is how much more performance you can squeeze out.
4 cores, 4 threads per socket on Core 2 is getting old too. Nehalem needs to make more of a showing at Apple.
admanimal
Sep 16, 2009, 05:26 PM
Very good news. Now if only all the apps for macs would take advantage of GCD & OCL.
You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.
DipDog3
Sep 16, 2009, 05:28 PM
But of course developers will need to rework their apps, and I doubt that a lot of them are going to do that until they release another version.
hassiman
Sep 16, 2009, 05:30 PM
My MacPro is one of those that can not boot to the 64 bit Kernal... but still runs 64 bit apps.. like LightRoom. Does this mean it can not take advantage of these advancements?:confused:
SHOlover
Sep 16, 2009, 05:34 PM
You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.
Exactly, but right now FCP has more than 50 percent of the editing market share. Not a bad user group for this type of technology. Compressor needs multithreading badly! Qmaster is just a band aid.
This is where FCS falls short:
1. Rendering
2. Encoding
3. Real time vector effects
This technology will help all of these. :D
akac
Sep 16, 2009, 05:36 PM
My MacPro is one of those that can not boot to the 64 bit Kernal... but still runs 64 bit apps.. like LightRoom. Does this mean it can not take advantage of these advancements?:confused:
Yes it can.
Icaras
Sep 16, 2009, 05:39 PM
Always gotta love those people who voted three negatives on articles like these.
ChrisA
Sep 16, 2009, 05:42 PM
I desperately want GC/OpenCL optimized Handbrake, but the developers have been saying for a long time that it is probably nowhere near, and may never come at all. At first they said it was because of Handbrake being cross platform, and more recently they say that it's out of their hands and up to the people who make the encoder they use.
Yes Handbrake is just a graphical shell for ffmpeg. All it does is collect some data from a few dialog boxes and radio buttons then pass off all the real work to ffmpeg.
You will have to wait until Grand central and Open CL are adopted by Linux and then the developers will modify ffmpeg. Likely it will take a couple years at least.
But this is all open source so you all can help if you want. What a Grand Central baded ffmeg? Anyone can build one if they like. No need to wait for others to do the work for you. If yuo lackthe skils then maybe you can pay someone to do the work or buy a Mac and give it to a developer
pmjoe
Sep 16, 2009, 05:43 PM
Let's be clear here, Grand Central Dispatch does not bring any performance improvements. It is just a library to simplify threading for some who might not otherwise do multi-threaded programming. It's the multi-threading that brings the performance improvements.
Similarly, OpenCL doesn't bring performance improvements, it's just a language that simplifies being able to run code on a GPU, which is probably difficult to do in a cross GPU way otherwise. But it's running code on the GPU that brings the performance improvement.
Also, since the article compares Leopard to Snow Leopard, it's difficult to tell if all the performance improvements are due to multi-threading and pushing code off onto the GPU. Why didn't they just run the non-GCD, non-OpenCL code on Snow Leopard rather than Leopard to compare? The article also makes some strange statement about decoding being more GPU intensive than encoding??? My understanding was that MPEG encoding is almost always more processor intensive. For that matter, how do you get a 165% CPU load???
pmjoe
Sep 16, 2009, 05:47 PM
What a Grand Central baded ffmeg? Anyone can build one if they like. No need to wait for others to do the work for you. If yuo lackthe skils then maybe you can pay someone to do the work or buy a Mac and give it to a developer
Given that ffmpeg is LGPL code and apparently that's what the folks cited in this article were modifying, they will probably need to make the code available. So, a version of the work is already done. ;)
apfhex
Sep 16, 2009, 05:48 PM
For that matter, how do you get a 165% CPU load???
100% = 1 core, so in this case with the quad core MP you could have up to 400% CPU usage. The GCD optimized version was using 30% more CPU (two cores instead of one) which is why the encoding was faster. :)
And the decoding isn't move intensive *overall* than the encoding, it's that the GPU was able to take more of the processing load for decoding (the GPU would be surely better at decoding than encoding).
LAS.mac
Sep 16, 2009, 05:52 PM
Open GL/Grand Central optimization is fine, however such improvement are probably an advantage of, what, 20% of the users?
I've been holding on SL upgrade, so far. I've been actually reading of more complains than positive comments.
I'd much prefer to see improved overall conditions, increased battery life for laptops, etc.
bob_the_gorilla
Sep 16, 2009, 05:53 PM
Yes Handbrake is just a graphical shell for ffmpeg. All it does is collect some data from a few dialog boxes and radio buttons then pass off all the real work to ffmpeg.
Dude, you don't know much about Handbrake.
mdriftmeyer
Sep 16, 2009, 05:56 PM
You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.
Wrong. All applications written for SL will take advantage of GCD when the 3rd party dev steps up and implements it. To see OpenOffice and MS Office re-written to leverage GCD and benefit drastically from blocks by leveraging unused CPU cores would bring random comments of ``everything is so fast and snaps when I make a change to my document, my 10,000 x 100 spreadsheet, to my ability to connect to database sources and scale up, etc.''
iLife will take full advantage of both.
OpenCL is being taken advantage of at the low-level presently.
Any application [including all of iWorks] can leverage OpenCL for it's offloading of number crunching, aiding Quartz in various aspects to streamlining processes for WebKit and thus give everyone an improved experience. Built-in SVG, WebGL for Opera, Firefox, Safari, Chrome, etc., will benefit on OS X.
The impact for an application like Keynote will be more visible the more Keynote expands into OpenGL presentations that include interactive fly-throughs and much more.
Any application that takes external data sources and requires numerical analysis of numbers, pattern matching, etc., will take advantage of it.
It's all dependent upon the developers time, vision and goals of their applications.
All games take advantage of it because of the instant dependency of the environment to sap the life out of a CPU(s).
Any Graphics Editor, Flash editor, SVG editor, multimedia suite, Audio/Video application can immediately leverage both but will require re-architecting portions of the code to make it happen.
It's not unreasonable to expect 6-9 months before major vendors bring out new versions leveraging both with considerable peformance improvements while reducing overhead to reach those aims.
mdriftmeyer
Sep 16, 2009, 05:59 PM
Handbrake is nice but the majority of Apple's machines sadly are dual cores. What I'd like to know is how much more performance you can squeeze out.
4 cores, 4 threads per socket on Core 2 is getting old too. Nehalem needs to make more of a showing at Apple.
Even a 2 Core system isn't being leveraged efficiently until the advent of SL to the market for Macs.
heisetax
Sep 16, 2009, 06:03 PM
So far with no apps using 10.6 & many times my current apps seem to run slower on my 1st gen 3 GHz Intel Mac Pro with dual ATI 3870 video cards, there is little incentive to change our OS. Maybe there will be by time we are up to 10.6.6 or so. By then the new bugs caused by going to 10.6 from 10.5.8 will be worked out.
So far I usually run 10.6.1 for an hour or 2 each day. I don't do any mission critical work as of yet. Like many I do not trust updates. Also why suffer the risk if things are not doing anything new or faster.
OS 10.6 Snow Leopard has to be a very small money deal for Apple. Before they got $129 per updated Mac with all PPC & Intel Macs available. Now they get $29 per updated Intel only Macs. This is a much smaller amount & a much smaller market. Was 10.6 written because Apple is now copying Microsoft. Remember when Apple went to OS X MS just came out with a new version of MS Office, Office X for Mac, which was just a change from System 9 to OS X code. OS 10.6 cut out all of the PPC Users & changed some of the OS to 64 bit. Now with the few new items no one got on the band wagon early with Apps for the new or updated features in OS 10.6.
Or is this slowness to change because Apple developers can see that most of Apple's work is on the iPhone so they are being very slow to work on programs for a system that the system maker is being laxs at having emphasis on this system.
When we we see these features used? That is probably why Apple took PPC support out. This is as by time these features are used most PPC Macs & 1st & 2nd gen Intel Macs are too old to move those recycled electrons around in a useful pattern.
I have OS 10.6 but I really don't use it. How many other OS 10.6 purchasers are like this?
Willis
Sep 16, 2009, 06:09 PM
My MacPro is one of those that can not boot to the 64 bit Kernal... but still runs 64 bit apps.. like LightRoom. Does this mean it can not take advantage of these advancements?:confused:
Yes it can.
Not OpenCL as in my MacPro's case, the ATI X1900 XT isn't supported.
You need to have a specific card for OpenCL to be used
Eidorian
Sep 16, 2009, 06:09 PM
Even a 2 Core system isn't being leveraged efficiently until the advent of SL to the market for Macs.Please do tell.
I got nearly double performance from Handbrake clock for clock moving from a 2.4 GHz Core 2 Duo to Quad. Here's the shocker under Windows XP.
I don't see how you can tell me that Core 2 wasn't being leveraged until Snow Leopard. I'm not coughing up for a Mac Pro but I know that Core 2 is at its limits for what I do.
Here's another shocker for you. The Nehalem platform takes single vs. dual vs. quad into account with Turbo Boost.
Cereal
Sep 16, 2009, 06:19 PM
Heh! The guy has (almost) the same rig is me (see my sig) but I have 2x his cores :D I've been waiting for this type of confirmation ever since GSD and OpenCL technologies were announced. If he got almost a 50% increase with a quad core machine i expect crazy numbers from an 8-core variety. Maybe ~200fps encoding rate.
I like handbrake but I'm not married to it. If there was another application out there (even paid, but reasonably priced) which could nab me 50-100%% gains I'd be all over it.
It would be nice but adding cores doesn't necessarily translate to a linear gain.
Henriok
Sep 16, 2009, 06:28 PM
but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.
Mail.app was the poster child on WWDC for use of GCD and that's pretty much a run-of-the-mill app if you ask me. Many applications can be made to use better threading, certainly all apps that are communicating with other services, such as networking.
ThomasJL
Sep 16, 2009, 06:28 PM
And Flash still slows the computer down and makes the fans spin like crazy.
Why does Flash still suck?
Seriously, even a cheap Winblows netbook can run Flash better than a top-of-the-line maxed-out Mac Pro.
pmjoe
Sep 16, 2009, 06:29 PM
Wrong. All applications written for SL will take advantage of GCD when the 3rd party dev steps up and implements it. [... blah, blah, blah]
There is so much wrong with this post that I don't know where to start. Do you honestly think applications like Office do not already use threads where they are useful? Why in the world would I do OpenCL for 3D when I have OpenGL???
Eidorian
Sep 16, 2009, 06:30 PM
And Flash still slows the computer down and makes the fans spin like crazy.
Why does Flash still suck?There's quite a bit of overhead running a 32-bit plug-in in 64-bit Safari.
Apple or Adobe are going to need to provide a new plug-in or run Safari in 32-bit mode for now.
pmjoe
Sep 16, 2009, 06:32 PM
100% = 1 core, so in this case with the quad core MP you could have up to 400% CPU usage. The GCD optimized version was using 30% more CPU (two cores instead of one) which is why the encoding was faster. :)
And the decoding isn't move intensive *overall* than the encoding, it's that the GPU was able to take more of the processing load for decoding (the GPU would be surely better at decoding than encoding).
Thanks, but my point was that the article was poorly written, and provides inadequate info with an inadequate analysis.
Henriok
Sep 16, 2009, 06:32 PM
Why does Flash still suck?
Because Adobe are idiots. It's just that simple.
SimonTheSoundMa
Sep 16, 2009, 06:38 PM
Exactly, but right now FCP has more than 50 percent of the none professional editing market share.
There, corrected for you. Any pro worth their nut will use Avid.
designgeek
Sep 16, 2009, 06:43 PM
I really really want an optimized HandBrake. Flash too. There's no good reason that my roommate's POS Dell should handle flash better than my MBP.
bob_the_gorilla
Sep 16, 2009, 06:49 PM
I got nearly double performance from Handbrake clock for clock moving from a 2.4 GHz Core 2 Duo to Quad. Here's the shocker under Windows XP.
Not surprised! HB is a bit of special case, in that it can actually make use of your computer. Many of the libraries it uses, such as x264, are already coded rather well for multicore machines. It is not a typical application, in that respect, and I don't think I'm wrong to say that GCD is relatively irrelevant to it under normal circumstances. OpenCL would probably be a huge boon, but implementing it is entirely non-trivial – and would have to happen in the libraries, not HB itself.
GCD/OpenCL are not magic bullets for performance. OpenCL is only useful in specific cases, and requires substantial changes to code. GCD makes writing apps which make use of multiple cores, like Handbrake, much easier, but it won't suddenly make all your existing apps fly.
GCD is like a quick-reference guide to help you easily get full marks on your Performance homework, and OpenCL is a special extra-credit assignment that's not for everyone. They're not going to hack into the school database and retroactively change all your grades.
eastcoastsurfer
Sep 16, 2009, 06:51 PM
Wrong. All applications written for SL will take advantage of GCD when the 3rd party dev steps up and implements it. To see OpenOffice and MS Office re-written to leverage GCD and benefit drastically from blocks by leveraging unused CPU cores would bring random comments of ``everything is so fast and snaps when I make a change to my document, my 10,000 x 100 spreadsheet, to my ability to connect to database sources and scale up, etc.''
iLife will take full advantage of both.
OpenCL is being taken advantage of at the low-level presently.
Any application [including all of iWorks] can leverage OpenCL for it's offloading of number crunching, aiding Quartz in various aspects to streamlining processes for WebKit and thus give everyone an improved experience. Built-in SVG, WebGL for Opera, Firefox, Safari, Chrome, etc., will benefit on OS X.
The impact for an application like Keynote will be more visible the more Keynote expands into OpenGL presentations that include interactive fly-throughs and much more.
Any application that takes external data sources and requires numerical analysis of numbers, pattern matching, etc., will take advantage of it.
It's all dependent upon the developers time, vision and goals of their applications.
All games take advantage of it because of the instant dependency of the environment to sap the life out of a CPU(s).
Any Graphics Editor, Flash editor, SVG editor, multimedia suite, Audio/Video application can immediately leverage both but will require re-architecting portions of the code to make it happen.
It's not unreasonable to expect 6-9 months before major vendors bring out new versions leveraging both with considerable peformance improvements while reducing overhead to reach those aims.
Um...
There is so much wrong with this post that I don't know where to start. Do you honestly think applications like Office do not already use threads where they are useful? Why in the world would I do OpenCL for 3D when I have OpenGL???
I agree. People are acting like developers have never thought about threads before. GCD makes creating and managing threads easier (also keeps a program from stepping all over the toes of another program when it comes to threads), but it will not suddenly make non-threadable programs threadable or make developers good and programming with threads and shared resources. To start, there has to be work that can run in parallel in the program. The standard GUI program of click, wait for response, click again doesn't have very much that can be useful for threads.
The big win comes in tasks like encoding/decoding. In those cases developers have already been using threads for a very long time, but they have probably been conservative on the number of threads created (too few and you don't use the procs, too many and context switching kills you). Now they can request all they want and let GCD manage how many threads to create dynamically based on the system the code is executing on.
haravikk
Sep 16, 2009, 06:51 PM
You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.
Thing is, GCD can actually have an impact even if the most trivial seeming app is converted to use it. The more apps you're running that use GCD rather than their own thread-management, the snappier your system ought to be overall, as GCD will be keeping the number of threads at as optimal a level as possible, rather than machines losing time to threads that barely do anything.
Eidorian
Sep 16, 2009, 06:55 PM
Thing is, GCD can actually have an impact even if the most trivial seeming app is converted to use it. The more apps you're running that use GCD rather than their own thread-management, the snappier your system ought to be overall, as GCD will be keeping the number of threads at as optimal a level as possible, rather than machines losing time to threads that barely do anything.Which is pretty much the expectations that I have for GCD. I'm just wondering how much of an impact it will have on Apple's dual core or break the bank environment.
Quad cores or even 4 cores, 8 threads aren't that hard to get unless you're buying a Mac. I brought this up before but how much more points can we squeeze out of a dual core today? Snow Leopard would be better off outside of Apple's homogenous nearly pure dual core environment.
admanimal
Sep 16, 2009, 07:01 PM
Wrong. All applications written for SL will take advantage of GCD when the 3rd party dev steps up and implements it. To see OpenOffice and MS Office re-written to leverage GCD and benefit drastically from blocks by leveraging unused CPU cores would bring random comments of ``everything is so fast and snaps when I make a change to my document, my 10,000 x 100 spreadsheet, to my ability to connect to database sources and scale up, etc.''
iLife will take full advantage of both.
OpenCL is being taken advantage of at the low-level presently.
Any application [including all of iWorks] can leverage OpenCL for it's offloading of number crunching, aiding Quartz in various aspects to streamlining processes for WebKit and thus give everyone an improved experience. Built-in SVG, WebGL for Opera, Firefox, Safari, Chrome, etc., will benefit on OS X.
The impact for an application like Keynote will be more visible the more Keynote expands into OpenGL presentations that include interactive fly-throughs and much more.
Any application that takes external data sources and requires numerical analysis of numbers, pattern matching, etc., will take advantage of it.
It's all dependent upon the developers time, vision and goals of their applications.
All games take advantage of it because of the instant dependency of the environment to sap the life out of a CPU(s).
Any Graphics Editor, Flash editor, SVG editor, multimedia suite, Audio/Video application can immediately leverage both but will require re-architecting portions of the code to make it happen.
It's not unreasonable to expect 6-9 months before major vendors bring out new versions leveraging both with considerable peformance improvements while reducing overhead to reach those aims.
You are so far off with this post I'm not even going to try to correct it, although I did probably underestimated the usefulness of GCD in standard apps. Your understanding of OpenCL is waaaay off.
Eric S.
Sep 16, 2009, 07:13 PM
But of course developers will need to rework their apps, and I doubt that a lot of them are going to do that until they release another version.
For those that really see a competitive advantage though, I would expect the upgrade to be done pretty rapidly.
Even a 2 Core system isn't being leveraged efficiently until the advent of SL to the market for Macs.
If that's true then OS X must have been really poorly coded. (And I don't think it was.)
Jimmetry
Sep 16, 2009, 07:20 PM
Right now, developers create threads when they're essential, not when they're potentially useful. This changes all that, and yes, the improvements will filter into all applications, even if it just means better GUIs (say goodbye to the beachball). Developers generally don't care enough to see the performance implications of threading, and a lot of time is wasted waiting for concurrent processes to catch up. With smaller and more mobile threads, this will change.
pmjoe
Sep 16, 2009, 07:38 PM
Right now, developers create threads when they're essential, not when they're potentially useful.
GCD will not help developers know any better about when threads might be "potentially useful".
and yes, the improvements will filter into all applications, even if it just means better GUIs
GUI libraries on pretty much every modern platform already have built-in multi-threading.
God of Biscuits
Sep 16, 2009, 07:40 PM
I've never seen an answer to this type of question, not even a ballpark figure....
Has anyone read what kind of speed up one might expect for h.264 video ENcoding using OpenCL on a MacPro that was maxed out with the standard-class video cards Apple offers?
just broad speedup. 10x? 100x?
Or is h.264 encoding not parallelizable enough to actually see much of a boost?
theBB
Sep 16, 2009, 07:46 PM
GCD will not help developers know any better about when threads might be "potentially useful".
If it makes it easier for them to code a few more sections of their apps multithreaded, programmers and companies will be a lot more likely to actually do so. There is always a cost benefit analysis. This hopefully reduces the cost.
whooleytoo
Sep 16, 2009, 07:50 PM
I'm not certain the next iLife apps will feature OpenCL; Apple likes to clearly differentiate between consumer and pro-level products, so OpenCL functionality could be a key differentiator.
I've not coded with OpenCL yet, but I can't imagine using it in any of the Pro apps would be a quick job.
2002cbr600f4i
Sep 16, 2009, 07:58 PM
GCD will not help developers know any better about when threads might be "potentially useful".
GUI libraries on pretty much every modern platform already have built-in multi-threading.
Yes, and no....
Yes, they INTERNALLY do things like drawing in their own internally created threads (ie, when I tell the system to "draw a window", that call may internally use threads to do the work of drawing), but it's usually up to the developer to manually create threads to go off and do any long-duration data processing work.
Excellent example... Let's say I have a UI with a button. When that button is pressed, I kick off some long process (say sorting 10M numbers?) that process is going to take say 10 seconds to run... The windowing toolkit might use it's own thread internally to draw the state changes of the button, but unless I write my sorting routine to execute in a thread, my program (and that program's UI) is effectively blocked from doing anything else, including responding to clicks, until that sorting process is done.
WELL written programs anticipate this long delay, and they do the processing in a thread and that thread then reports back to the applications/UI's main thread to say "Ok, I'm done". In the mean time, the application is responsive (or maybe it's showing a progress bar or something, but it's not totally hung/beachballing.)
This exact paradigm and issue exists in Windows, it exists in OSX, it exists in QT, it exists in Linux's window toolkits, it exists in Java... The ONLY fix is to write all your data processing code in threads... And in many of those systems that is a royal PAIN to do! So most developers don't bother unless it's something where they know the code being called will take a long time to run.
GCD changes all that... With the changes to Objective-C (blocks) it's STUPID easy to set up the long-running data processing code to execute in a thread. The whole point is that GCD takes care of figuring out how many threads to build, how to manage them, how to dynamically balance them across the processing resources available, etc. The whole point is to lower the barriers to using threaded code to the point that developers will just automatically do it since there's no real thought required to do it. And by doing so, all applications on the system become more responsive and snappier (note, I didn't say *FASTER*... They might not process data any quicker, but they'll stay responsive to user inputs and such better.)
How many of you have been doing a big encoding job in Handbrake that's eating the CPU so badly that you can't hardly do ANYTHING on your 2-4 core Mac? The whole system feels sluggish. Why? Because Handbrake sets up the threads itself, and it's chewing every cycle it can get, damn the rest of the system! GCD takes care of that. It keeps apps from going through thread starvation. It distributes the work being requested as optimally as possible across all the cores you have, and if you're trying to do something in a 2nd application and it needs a little CPU time to respond, GCD makes sure it gets it.
OpenCL - Eh, It has potential to open up additional computing resources that we have on our machines, but OpenCL requires a good bit more work on a developer's part to use than GCD, and it doesn't help in all instances. Large dataset jobs, where the data processing is independent of each other OpenCL will rock at. But that's usually a subset of what's typically done on a computer (often data and processes are interlinked together and can't be run in parallel.)
AidenShaw
Sep 16, 2009, 07:59 PM
... as GCD will be keeping the number of threads at as optimal a level as possible, rather than machines losing time to threads that barely do anything.
I'll be polite, but this analysis is sorely lacking in understanding of threading mechanisms....
"Idle" threads don't "lose time". Looking at the half-dozen Windows systems on my desk (that includes virtual machines and remote desktops on servers in the raised floor lab) the range is about 600 to 2000 threads alive on each system, with the average around 1000. (Windows "Task Manager" tab for "Performance" lists the total threads running on the system. The IE8 task that I'm posting from has 108 threads. Perhaps an OSX user could report on the active thread count on OSX.)
There's little cost to idle threads in a well designed thread scheduler. Some kernel RAM to describe the thread - but not much more.
The "threading problem" is how to chop up the application algorithm into independent chunks that can run in parallel threads on the multiple CPUs found in most systems. It is not that "idle threads" are strangling the system.
"Grand Central" (and "ConcRT" (Concurrency RunTime) in Windows) are simply tools to make it easier for a programmer to identify those chunks and run them on multiple threads - that's why programs have to be rewritten to use the new threading models.
Note that GCD/ConcRT won't help much for programs that are already written to use multiple cores. Those have been written to identify and exploit parallelism using traditional threading models (although 10.6 has much better thread performance at the kernel primitive level, so these existing threaded apps may benefit from kernel improvements).
Yes, and no....
Very good post!
Willis
Sep 16, 2009, 08:07 PM
I'll be polite, but this analysis is sorely lacking in understanding of threading mechanisms....
"Idle" threads don't "lose time". Looking at the half-dozen Windows systems on my desk (that includes virtual machines and remote desktops on servers in the raised floor lab) the range is about 600 to 2000 threads alive on each system, with the average around 1000. (Windows "Task Manager" tab for "Performance" lists the total threads running on the system. The IE8 task that I'm posting from has 108 threads. Perhaps an OSX user could report on the active thread count on OSX.)
Well, I only have Safari, Mail, iTunes, last.fm running and heres what my activity monitor is reporting under CPU
Threads: ~280-/+
Processes: 56
Usage: >85% Idle / System 2-7% / User 3-10%
So nothing much going on at the moment
viktore
Sep 16, 2009, 08:08 PM
Now if only Snow Leopard could actually function properly...
2002cbr600f4i
Sep 16, 2009, 08:10 PM
Also, forgot to mention, GCD works by managing a POOL of threads. It sets up the threads (which from what I've read, in OSX, are somewhat heavyweight objects with a lot of overhead, but still less than a process.) Blocks on the other hand basically say to GCD "Hey, here's some code I want run, it can be run in parallel, please schedule it on one of your threads and let me know when you're done." There's VERY VERY little overhead involved (think 10's of bytes rather than several 10's of Kbytes for a thread.) That reduced overhead means you can have a LOT more of them created and in memory at once and it's quicker to switch between them as well.
So, GCD may look at your system and say "I can support 200 threads with this hardware", but your active programs might have 10,000 blocks being passed around in GCD's scheduler to get time to run on those 200 threads.
Again. there's NO bookkeeping that the developer needs to do, the system just takes care of it for you.
I kind of like to think of it as a block-->thread scheduler similar to the OS's process-->process scheduler.
Eidorian
Sep 16, 2009, 08:21 PM
How many of you have been doing a big encoding job in Handbrake that's eating the CPU so badly that you can't hardly do ANYTHING on your 2-4 core Mac? The whole system feels sluggish. Why? Because Handbrake sets up the threads itself, and it's chewing every cycle it can get, damn the rest of the system! GCD takes care of that. It keeps apps from going through thread starvation. It distributes the work being requested as optimally as possible across all the cores you have, and if you're trying to do something in a 2nd application and it needs a little CPU time to respond, GCD makes sure it gets it.Not too often once I got the Q6600 ages ago. I've been gaming, recording TV, and transcoding in HandBrake without skipping a beat in game in some instances. This was in Windows Vista mind you.
Well, I only have Safari, Mail, iTunes, last.fm running and heres what my activity monitor is reporting under CPU
Threads: ~280-/+
Processes: 56
Usage: >85% Idle / System 2-7% / User 3-10%
So nothing much going on at the momentJust off hand I get about 260 threads, 60 processes on average under OS X. That's with a browser and iTunes open. Like you said not much going on.
pmjoe
Sep 16, 2009, 08:22 PM
Excellent example... Let's say I have a UI with a button. When that button is pressed, I kick off some long process (say sorting 10M numbers?) that process is going to take say 10 seconds to run... The windowing toolkit might use it's own thread internally to draw the state changes of the button, but unless I write my sorting routine to execute in a thread, my program (and that program's UI) is effectively blocked from doing anything else, including responding to clicks, until that sorting process is done.
WELL written programs anticipate this long delay, and they do the processing in a thread and that thread then reports back to the applications/UI's main thread to say "Ok, I'm done". In the mean time, the application is responsive (or maybe it's showing a progress bar or something, but it's not totally hung/beachballing.)
That has nothing to do with the GUI, and I can do a progress bar without threads. To be useful, your whole application model would have to be smart enough to expect work to be done in a background thread and handle the result of that work when it completes. Otherwise, using threads is pointless. That is not as trivial as you just described.
This exact paradigm and issue exists in Windows, it exists in OSX, it exists in QT, it exists in Linux's window toolkits, it exists in Java... The ONLY fix is to write all your data processing code in threads... And in many of those systems that is a royal PAIN to do! So most developers don't bother unless it's something where they know the code being called will take a long time to run.
GCD changes all that... With the changes to Objective-C (blocks) it's STUPID easy to set up the long-running data processing code to execute in a thread. The whole point is that GCD takes care of figuring out how many threads to build, how to manage them, how to dynamically balance them across the processing resources available, etc.
Yawn! Threads are stupid easy to create in Java and so are thread pools. GCD cannot make developers more intelligent about when to use threads or how many to construct. Last I checked, I didn't think threads are that difficult in C or C++ either. Blocks constructs may help slightly with avoiding some concurrency issues, but what I've seen so far hasn't been that exciting.
The whole point is to lower the barriers to using threaded code to the point that developers will just automatically do it since there's no real thought required to do it.
No thought required to threads? THAT'S REALLY SCARY.
How many of you have been doing a big encoding job in Handbrake that's eating the CPU so badly that you can't hardly do ANYTHING on your 2-4 core Mac? The whole system feels sluggish. Why? Because Handbrake sets up the threads itself, and it's chewing every cycle it can get, damn the rest of the system!
OS X must've had a really crappy kernel until last month.
Eidorian
Sep 16, 2009, 08:27 PM
OS X must've had a really crappy kernel until last month.Don't worry too much about it. Evey time a new version of OS X is announced everyone seems to claim it is going to be our salvation.
Well something along those lines. I stopped caring after Spotlight and Time Machine.
2002cbr600f4i
Sep 16, 2009, 08:32 PM
Yawn! Threads are stupid easy to create in Java and so are thread pools. GCD cannot make developers more intelligent about when to use threads or how many to construct. Last I checked, I didn't think threads are that difficult in C or C++ either. Blocks constructs may help slightly with avoiding some concurrency issues, but what I've seen so far hasn't been that exciting.
Yes, it is easy (and yes, I am a Java developer, and I know it's easy to create threads.) It's NOT easy, however, to manage them effectively in a large Swing application... Hence why the SwingWorker system came out... When I read about Blocks and GCD, I immediately thought "This is they're version of SwingWorker!" and in a lot of ways it IS.
And as I said (and many others have as well) not all programs processing jobs can be broken up into multiple threads in order to run faster, but by using threads to handle the whole "go do this when I press this button, and do it off the UI thread so the UI can remain responsive" the whole system is snappier and better able to respond and adjust resources as needed.
Note that Java or C++ or any of the other language specific threading systems don't really do anything to keep you from creating a crap load of threads that aren't really doing anything. Threads might not take CPU time if they're idle, but they DO take up memory, and they DO take up space in the thread management/task scheduler.
No thought required to threads? THAT'S REALLY SCARY.
You're misquoting me... It makes it STUPID EASY to dispatch processing work off to threads (like what I said with the processing work invoked by the button press.) It does NOT make it "stupid easy" to redesign your program to do concurrent data processing... That still takes work. But blocks and GCD remove some of the overhead boilerplate type work that developers usually have to do to create and manage threads on their own.
The point is to lower the barriers to using these tools so more developers will put in the effort to use them, rather than think "This is too much work, screw it, it's not like this chunk of code will take more than a second to do." (forgetting that that chunk of code might need to wait on a network connection or diskIO or something like that which could get blocked or stalled, making the whole app hang...)
TPALTony
Sep 16, 2009, 08:33 PM
Um...
I agree. People are acting like developers have never thought about threads before. GCD makes creating and managing threads easier (also keeps a program from stepping all over the toes of another program when it comes to threads), but it will not suddenly make non-threadable programs threadable or make developers good and programming with threads and shared resources. To start, there has to be work that can run in parallel in the program. The standard GUI program of click, wait for response, click again doesn't have very much that can be useful for threads.
The big win comes in tasks like encoding/decoding. In those cases developers have already been using threads for a very long time, but they have probably been conservative on the number of threads created (too few and you don't use the procs, too many and context switching kills you). Now they can request all they want and let GCD manage how many threads to create dynamically based on the system the code is executing on.
I believe that is the point. Simple things that CAN benefit from threads are frequently NOT threaded because you have to do quite a lot of thread management as soon as you start messing with them.
For instance, the simple act of doing "the same thing", let's say something that takes 2 seconds on each item in a collection, typically gets done along the lines of...
for (int i = 0 ; i < [collection count] ; i++) {
[[collection itemAtIndex:i] doTwoSecondTask];
}
Most developers would leave it at that, but with blocks and GCD you can literally rewrite that as...
queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply(count, queue, ^(size_t i) {
[[collection itemAtIndex:i] doTwoSecondTask];
}); // I didn't test this, I'm cribbing from the ADC documentation.
Now you have the same number of lines of code, a slightly different syntax, but your 2 second task is being done in parallel, with GCD deciding how many can be run at a time based on what your overall system is doing.
Remember, GCDs job is NOT to make everything run ultra parallel. It's job is to the make the overall SYSTEM continue to be responsive by being aware HOLISTICALLY of what is going on. Even if you thread pool yourself in your own app, which is somewhat time consuming to plumb in, you are only ever aware of what YOUR APP is doing. GCD sees the bigger picture...
So while it might speed some things up, I think it's big win is that it makes it EASY to thread things that otherwise would not be threaded without writing lots of thread pool management code. OK so you could use NSOperation, but that's not as holistic as GCD. There are lots and lots of "quick wins" you get with GCD, and that's what makes it applicable across the board.
Just my 2c. :)
AidenShaw
Sep 16, 2009, 08:35 PM
Also, forgot to mention, GCD works by managing a POOL of threads. It sets up the threads (which from what I've read, in OSX, are somewhat heavyweight objects with a lot of overhead, but still less than a process.)
Again, this analysis is sorely lacking in understanding of threading mechanisms....
A "thread" is a subset of a process. A thread has the same virtual memory environment as the process that contains it. You can't have a global pool of threads, you can only have process pools.
In Windows, a "process" is a description of virtual memory and system context. It cannot execute on its own. A "thread" is an execution context within a process - it contains the dynamic state. A process must have at least one thread - otherwise it would have no value.
About Processes and Threads
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
A thread is the entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients.
Microsoft Windows supports preemptive multitasking, which creates the effect of simultaneous execution of multiple threads from multiple processes. On a multiprocessor computer, the system can simultaneously execute as many threads as there are processors on the computer.
A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. Operations performed on the job object affect all processes associated with the job object.
User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. An application can switch between UMS threads in user mode without involving the system scheduler and regain control of the processor if a UMS thread blocks in the kernel. Each UMS thread has its own thread context instead of sharing the thread context of a single thread. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls.
A fiber is a unit of execution that must be manually scheduled by the application. Fibers run in the context of the threads that schedule them. Each thread can schedule multiple fibers. In general, fibers do not provide advantages over a well-designed multithreaded application. However, using fibers can make it easier to port applications that were designed to schedule their own threads.
For more information, see the following topics: link (http://msdn.microsoft.com/en-us/library/ms681917(VS.85).aspx)
The system has processes, processes have threads, and threads have fibers.
(This is a Windows-centric description, but a "process" is a hardware entity that is defined by the CPU. I've used a number of threading models, but they've all followed this general model (except, of course, old Linux systems that simulated threads using processes). Please, any OSX developers point out where OSX varies from this model.)
wizard
Sep 16, 2009, 08:42 PM
Many properly written apps are already benefit from the SL / GCD combo. Apeture is a good example of an app that is much better running under SL without being written to take advantage of GCD and OpenCL expressly. In a nut shell the work that went into GCD is haveing a big payoff for threaded apps.
Now this isn't to say Apeture or anything else has been optimized for SL. Just that SL giveS multi threaded apps more respectable behavior.
In any even, getting back on track, I have to agreethat focusing on the core encoding routines is in order. This would have a huge pay off for everybody, especially those encodings amendable to parallel programming.
Asto those surprised by these numbers I have to ask wherehave you been. There is much that could be accellerated via these techniques. This is why I jumped at Snow Leopard the day it came out. Not because I expected immediate pay off but rather because once a standard way of doing things becomes available it will be adopted wide scale. Well I have to admit that there was a bit of expectation for an immediate pay off but that has a lot to do with knowing how bad the Macs threading model was with respect to things like Linux. The fact remains it is all downhill from here, as more and more libraries and programs get updated to the new tech our macs will just get faster.
I've seen many complaints about SL in the forums but I must say I'm happy. Part of that is due to the very noticeable improvement to existing apps. That say a lot about the mechanisms over which GCD and OpenCL are implemented. It really makes one wonder how programs like Apeture will fare when accelerated purposefully.
Dave
satcomer
Sep 16, 2009, 08:47 PM
Not OpenCL as in my MacPro's case, the ATI X1900 XT isn't supported.
You need to have a specific card for OpenCL to be used
I am in the same boat. I 'll have to buy overpriced card (overpriced compared to the Windows same cards) for mu Mac Pro to taker advantage of CL. I am still pissed that NVidia & ATI still charge way more for the Mac the same price they make for Windows machines. I expect to pay about $50 dollars for EFI capability but the $100 + doesn't make much since in my book.
a.gomez
Sep 16, 2009, 08:51 PM
let me know when this is on Microsoft Office or CS.
wizard
Sep 16, 2009, 08:55 PM
You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.
The ability to use these facilities effectively is up to the programmer and the algorithms he is working with. Many apps benefit from threading GCD just provides for a way to leverage a bunch of CPUs in a different way than normal threads or the use of NSOperation.
Dave
2002cbr600f4i
Sep 16, 2009, 09:00 PM
Again, this analysis is sorely lacking in understanding of threading mechanisms....
Believe me, I know ALL about the differences between processes, threads and (MS calls them Fibers, I forget what Apple is calling them.) Hard to get a Computer Science degree from Georgia Tech without learning that ;) I'm just simplifying things for the more layperson reader here...
Really, anyone who wants to get a better understanding of what GCD does and how it works should go read the ArsTechnica Snow Leopard review, pages 11-15 (http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/11) He did a FANTASTIC job of describing and illustrating how GCD and OpenCL work, what they do, and why they're helpful (but not some magic bullet like some people seem to expect...)
2002cbr600f4i
Sep 16, 2009, 09:02 PM
Many properly written apps are already benefit from the SL / GCD combo. Apeture is a good example of an app that is much better running under SL without being written to take advantage of GCD and OpenCL expressly. In a nut shell the work that went into GCD is haveing a big payoff for threaded apps.
Now this isn't to say Apeture or anything else has been optimized for SL. Just that SL giveS multi threaded apps more respectable behavior.
Dave
In reality, the speed ups you're experiencing are probably only because parts of the OS code have been rewritten to use GCD, and thus respond faster than they used to. IE: the underlying system libraries might be better/more threaded than before. But Apeture itself hasn't been re-written/modified to use it itself. Once it is, you'll probably see even more performance improvements.
AidenShaw
Sep 16, 2009, 09:06 PM
Many properly written apps are already benefit from the SL / GCD combo.
...that has a lot to do with knowing how bad the Mac's threading model was with respect to things like Linux.
Agree - but these apps aren't benefitting from Grand Central per se, they're helped by work that Apple had to do to fix some fundamental brain damage in the 10.5 and earlier threading code.
Any app using pthreads or other ways of managing threads will see an improvement - but it is not because they use GCD. It is because Apple had to improve the sorry state of threading in OSX for GCD.
In reality, the speed ups you're experiencing are probably only because parts of the OS code have been rewritten to use GCD, and thus respond faster than they used to. IE: the underlying system libraries might be better/more threaded than before. But Apeture itself hasn't been re-written/modified to use it itself. Once it is, you'll probably see even more performance improvements.
Or, because parts of the OS haven't been rewritten to use GCD - but they use the improved thread primitives.
It would be interesting to see comparisons of pthread performance in 10.5 and 10.6. If pthreaded apps run better in 10.6, then it's not GCD - but the foundation work done for GCD.
MattInOz
Sep 16, 2009, 09:24 PM
Open GL/Grand Central optimization is fine, however such improvement are probably an advantage of, what, 20% of the users?
I've been holding on SL upgrade, so far. I've been actually reading of more complains than positive comments.
I'd much prefer to see improved overall conditions, increased battery life for laptops, etc.
Grand Central optimization and OpenCL is what allows the next round of Battery improvements to happen. Do more with less.
OpenCL allows use of wasted cycles in the GPU that your already paying the power budget for.
What is the Rule of Thumb the same core at half the speed will use a 1/4 of the Power. So doubling the number of cores but keeping the same number of cycles give you the same amount of work for half the power. Double again for half the power again. Better still if only one core is active you can power down the others. Multi cores are just better at using the right amount of power for the job at hand.
This is where you rely on GCD or Threading, the better a program, even something small, scales across this sort of environments the more room Apple has to power scale and save your battery.
Problem is subjectively you'll measure the performance of the whole machine based on which of your Applications has the highest demand for a single core.
Falcor Derkins
Sep 16, 2009, 09:25 PM
You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.
GCD = multithreading which most applications can take advantage of if they aren't already threaded so long as there are operations that can be run in parallel.
OpenCL will only speed up data intensive tasks of a specific nature as you mentioned, but these are generally all of the programs that need to run faster anyway. who needs to speed up word? but faster itunes encoding, faster imovie rendering, etc. would be nice.
"If he got almost a 50% increase with a quad core machine i expect crazy numbers from an 8-core variety"
the tests compare running on the GPU vs. the CPU, unless the leopard implementation wasn't threaded. in fact his numbers would be far less impressive given a faster 8-core machine as that's what he's comparing against. If that doubles the speed then the GPU implementation may be slower (of course it's not as linear as that).
Actually the numbers aren't that impressive, perhaps because the now ancient GPU they used. see http://www.anandtech.com/video/showdoc.aspx?i=3339&p=1 or other CUDA (equivalent to OpenCL and has been around for a few years) based benchmarks. Some tasks have received 100x speedup over a CPU implementation. This isn't really new technology, but it's exciting to see more mainstream applications taking advantage of GPGPUs.
"Wrong. Any application [including all of iWorks] can leverage OpenCL for it's offloading of number crunching, aiding Quartz in various aspects to streamlining processes for WebKit and thus give everyone an improved experience. Built-in SVG, WebGL for Opera, Firefox, Safari, Chrome, etc., will benefit on OS X.
The impact for an application like Keynote will be more visible the more Keynote expands into OpenGL presentations that include interactive fly-throughs and much more."
Keep in mind that running any operation through OpenCL, ignoring that fact that it can run on the CPU as well, requires to transfer data to the GPU. The overhead isn't worth small operations and the GPUs architecture is very different from the CPU. Tasks that require a lot of branching or sequential operations (a lot of tasks you normally do) don't perform well on the GPU. Given that people typically have pretty much unused CPU cores now anyway, it makes no sense to transfer everything to the GPU. It isn't an automatic make everything faster device. And while useful for many games, the GPU may already be the bottleneck just for rendering it won't always make sense to offload AI, etc. in those cases. Nowadays it seems like most developers don't even try to push the hardware anymore anyways, everything is made to run on consoles first which are 4 years old. I can't find games to really tax my quad core i7 and g260. no joke. and that's running things like far cry 2 at 1080p with maxxed settings.
"Until recently the developer API for OpenCL wasn't even available which is why you won't see it in many apps. On the GC side, now that Apple has open sourced it as libdispatch, it's entirely possible more OSes and developers will start using that. But it may take years for that to standardize and happen."
the API has been available for a while to developers, just not the public. Before that we had CUDA, which is almost a direct translation to OpenCL. There are already implementations of OpenCL on linux and windows. That's the whole idea of it being open, it's not exclusive to apple like directx11. I think the biggest problem right now is just that there are a very small pool of developers that know how to use this, and most of them are probably in graduate school like me.
The biggest gain will probably come to final cut pro, motion, etc. It's ridiculous that people wait hours for video to render in these applications.
AidenShaw
Sep 16, 2009, 09:38 PM
OpenCL allows use of wasted cycles in the GPU that your already paying the power budget for.
Nonsense - like the CPU, the GPU uses power according to the load put on it. If you're not doing heavy 3D or GPGPU work - there's no power drain.
And the converse is true - once you harness those GPU computing units, the power drain and heat production shoot up, and battery life plummets.
There's no free lunch....
MattInOz
Sep 16, 2009, 09:53 PM
Nonsense - like the CPU, the GPU uses power according to the load put on it. If you're not doing heavy 3D or GPGPU work - there's no power drain.
And the converse is true - once you harness those GPU computing units, the power drain and heat production shoot up, and battery life plummets.
There's no free lunch....
Sure but it isn't a smooth power curve is it?
The power increases in steps.
So like the gears in your car each step has a sweet spot of efficiency. The smoother you drive the less fuel you use.
Chimpy
Sep 16, 2009, 10:12 PM
I'm drooling in anticipation of what 2010 will bring in speed increases for me and my 2006 Mac Pro...should be interesting.
Amdahl
Sep 16, 2009, 10:20 PM
Now you have the same number of lines of code, a slightly different syntax, but your 2 second task is being done in parallel, with GCD deciding how many can be run at a time based on what your overall system is doing.
This seems a lot like OpenMP for Objective-C. Which is a good thing.
But the other commenters are right; GCD is not the big gain the Steve Jobs Reality Distortion Field wants you to think it is. It is only an incremental improvement. The big gains were made years ago when developers implemented multi-threading. (If they did.)
I think something important is being left out of these Fantastical Exampicals. In the real world, you need to know when your little parallel block task is done. That means, you are waiting for some notification (very thread like), or you are stopping your execution until the entire GCD 'block' is done. Because of this, the GCD gains will be few and far between. And will probably still manage to bring in bugs for developers who actually believe that GCD makes things 'stupid easy.'
I would bet most of the gain in the example app in this article is from OpenCL, which actually brings new computing resources, rather than GCD, which gives you the same old crap in a new box.
Now this isn't to say Apeture or anything else has been optimized for SL. Just that SL giveS multi threaded apps more respectable behavior.
This is probably from improvements in the kernel. OSX has traditionally had crappy thread performance, and perhaps that is now fixed.
JesterJJZ
Sep 16, 2009, 10:23 PM
There, corrected for you. Any pro worth their nut will use Avid.
That's very not true...FCP is used to edit many major network television shows. Avid used to be top dog but FCP has stolen much of Avid's thunder over the years. Do some research before speaking next time.
pmjoe
Sep 16, 2009, 10:29 PM
Yes, it is easy (and yes, I am a Java developer, and I know it's easy to create threads.) It's NOT easy, however, to manage them effectively in a large Swing application... Hence why the SwingWorker system came out... When I read about Blocks and GCD, I immediately thought "This is they're version of SwingWorker!" and in a lot of ways it IS.
And as I said (and many others have as well) not all programs processing jobs can be broken up into multiple threads in order to run faster, but by using threads to handle the whole "go do this when I press this button, and do it off the UI thread so the UI can remain responsive" the whole system is snappier and better able to respond and adjust resources as needed.
SwingWorker is an absurdly trivial piece of code. It is convenient, but hardly a miracle. Blocks are actually somewhat more interesting, but relating SwingWorker to GCD is a bit of a stretch.
http://java.sun.com/products/jfc/tsc/articles/threads/src/SwingWorker.java
Note that Java or C++ or any of the other language specific threading systems don't really do anything to keep you from creating a crap load of threads that aren't really doing anything. Threads might not take CPU time if they're idle, but they DO take up memory, and they DO take up space in the thread management/task scheduler.
Can you really admit to being a developer and say this in the same discussion?
You're misquoting me... It makes it STUPID EASY to dispatch processing work off to threads (like what I said with the processing work invoked by the button press.) It does NOT make it "stupid easy" to redesign your program to do concurrent data processing... That still takes work. But blocks and GCD remove some of the overhead boilerplate type work that developers usually have to do to create and manage threads on their own.
Creating threads and starting them was never the difficult part in the first place. Outside of thread pools, I don't see much "management" going on here. All blocks do is make the easy part even easier.
The point is to lower the barriers to using these tools so more developers will put in the effort to use them, rather than think "This is too much work, screw it, it's not like this chunk of code will take more than a second to do." (forgetting that that chunk of code might need to wait on a network connection or diskIO or something like that which could get blocked or stalled, making the whole app hang...)
But going back to the GUI button example here, you never give a good reason. The thread goes off and does its thing, the user goes off and does theirs. The thread returns and the user has now gotten the app into a different state than when the thread started ... WTF is the programmer who didn't know how to create a thread before blocks existed going to do??? He's screwed. In the meantime, the user has gone off and created a couple more threads that may or may not have finished before the first one did.
This programmer is never going to get that poor user's data back into a consistent state. But thank God his user's GUI is responsive!
Next time, try making the hard part easier.
freiheit
Sep 16, 2009, 10:30 PM
What I would like to see, as a raw Grand Central Dispatch benchmark which any user could run, is a simple nested loop test like the one mentioned in the extensive Snow Leopard review on ArsTechnica.
Example:
Outer loop contains a list of movie titles, some in ALL CAPS, some all lowercase, some MiXEd CAse. Test run without GCD loops through this list one at a time and launches a second loop to make each word within the current title lowercase with initial capital, thereby rendering the list, one word at a time, as Title Case.
Next step is the same run through but using GCD to launch as many concurrent loops as possible on the user's hardware. Since each movie title is unique and does not rely on any previous title being converted, there's no problem with doing them out of order.
Final result, show the actual time to convert the list each way and the % of speedup by using Grand Central Dispatch. This could then be run on any other Intel Mac to find how much potential improvement there is in more cores and higher clock speeds.
AidenShaw
Sep 16, 2009, 10:34 PM
Sure but it isn't a smooth power curve is it?
The power increases in steps.
So like the gears in your car each step has a sweet spot of efficiency. The smoother you drive the less fuel you use.
No, it's like "if the engine is off, it doesn't burn gas".
The major effort on power management in the last decade has been to disable and power down units that aren't being used - even for tiny fractions of a second.
Those GPUs burn a lot of watts when they are busy. One shouldn't assume that OpenCL will help battery life. The job may finish faster, but the watts per minute consumed may be higher.
Talez
Sep 16, 2009, 10:54 PM
GCD = multithreading which most applications can take advantage of if they aren't already threaded so long as there are operations that can be run in parallel.
That's really underselling GCD.
The whole point is that you can setup a potentially lengthy (yet very tiny code wise) operation away from the main thread without having to have learn how to properly multithread an application. GCD lets you fire off a simple operation to run in parallel with the main thread (and not block it during a lengthy operation) using, effectively, two lines of code.
The automatic thread pool management stuff is just gravy.
wizard
Sep 16, 2009, 11:09 PM
In reality, the speed ups you're experiencing are probably only because parts of the OS code have been rewritten to use GCD, and thus respond faster than they used to.
I have to disagree as I believe Apple refactored NSOperation and related stuff to run on top of the new threading architecture. It is the only viable explanation I have for some of the speed ups seen in older code. We maybe saying the samething but I'm specificaly saying that the infrastructure in place in SL to support GCD has had an excellent impact on existing software. Well existing software that took advantage of Apples NS threading primitives.
IE: the underlying system libraries might be better/more threaded than before. But Apeture itself hasn't been re-written/modified to use it itself. Once it is, you'll probably see even more performance improvements.
Well yah a rewrite can always speed things up, I'm just saying there has been a positive impact on existing software. That has a lot to do with the infrastructure put in place for GCD.
Dave
twoodcc
Sep 16, 2009, 11:09 PM
i guess it'll take awhile before a lot of apps are taking advantage of it. but glad someone is already
MorphingDragon
Sep 16, 2009, 11:09 PM
No, it's like "if the engine is off, it doesn't burn gas".
The major effort on power management in the last decade has been to disable and power down units that aren't being used - even for tiny fractions of a second.
Those GPUs burn a lot of watts when they are busy. One shouldn't assume that OpenCL will help battery life. The job may finish faster, but the watts per minute consumed may be higher.
Sometimes Aiden I wonder why you bother staying here. I'm sure your computer views would be better spend on a general tech site. Its like shouting to the death here.
mdriftmeyer
Sep 16, 2009, 11:30 PM
This seems a lot like OpenMP for Objective-C. Which is a good thing.
But the other commenters are right; GCD is not the big gain the Steve Jobs Reality Distortion Field wants you to think it is. It is only an incremental improvement. The big gains were made years ago when developers implemented multi-threading. (If they did.)
I think something important is being left out of these Fantastical Exampicals. In the real world, you need to know when your little parallel block task is done. That means, you are waiting for some notification (very thread like), or you are stopping your execution until the entire GCD 'block' is done. Because of this, the GCD gains will be few and far between. And will probably still manage to bring in bugs for developers who actually believe that GCD makes things 'stupid easy.'
I would bet most of the gain in the example app in this article is from OpenCL, which actually brings new computing resources, rather than GCD, which gives you the same old crap in a new box.
This is probably from improvements in the kernel. OSX has traditionally had crappy thread performance, and perhaps that is now fixed.
Wrong. Block management is done at the System-level and you don't have to worry about it.
mdriftmeyer
Sep 16, 2009, 11:32 PM
That has nothing to do with the GUI, and I can do a progress bar without threads. To be useful, your whole application model would have to be smart enough to expect work to be done in a background thread and handle the result of that work when it completes. Otherwise, using threads is pointless. That is not as trivial as you just described.
Yawn! Threads are stupid easy to create in Java and so are thread pools. GCD cannot make developers more intelligent about when to use threads or how many to construct. Last I checked, I didn't think threads are that difficult in C or C++ either. Blocks constructs may help slightly with avoiding some concurrency issues, but what I've seen so far hasn't been that exciting.
No thought required to threads? THAT'S REALLY SCARY.
OS X must've had a really crappy kernel until last month.
It's nothing like Java's thread pooling and the blocks are actually C blocks.
http://clang.llvm.org/docs/BlockImplementation.txt
eastcoastsurfer
Sep 16, 2009, 11:42 PM
I believe that is the point. Simple things that CAN benefit from threads are frequently NOT threaded because you have to do quite a lot of thread management as soon as you start messing with them.
For instance, the simple act of doing "the same thing", let's say something that takes 2 seconds on each item in a collection, typically gets done along the lines of...
for (int i = 0 ; i < [collection count] ; i++) {
[[collection itemAtIndex:i] doTwoSecondTask];
}
Most developers would leave it at that, but with blocks and GCD you can literally rewrite that as...
queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply(count, queue, ^(size_t i) {
[[collection itemAtIndex:i] doTwoSecondTask];
}); // I didn't test this, I'm cribbing from the ADC documentation.
Now you have the same number of lines of code, a slightly different syntax, but your 2 second task is being done in parallel, with GCD deciding how many can be run at a time based on what your overall system is doing.
Remember, GCDs job is NOT to make everything run ultra parallel. It's job is to the make the overall SYSTEM continue to be responsive by being aware HOLISTICALLY of what is going on. Even if you thread pool yourself in your own app, which is somewhat time consuming to plumb in, you are only ever aware of what YOUR APP is doing. GCD sees the bigger picture...
So while it might speed some things up, I think it's big win is that it makes it EASY to thread things that otherwise would not be threaded without writing lots of thread pool management code. OK so you could use NSOperation, but that's not as holistic as GCD. There are lots and lots of "quick wins" you get with GCD, and that's what makes it applicable across the board.
Just my 2c. :)
The problem is that there is a lot more that you have to assume for your example to work. The big one is that the task you need to perform on each item is completely independent of all the other tasks that need to be performed on the other items (often the case in encoding/decoding, applying image filters etc..., not so much in GUI apps). Do any of the later tasks rely on the completion of the earlier tasks? What about shared resources? Do the tasks require access such a resource? Now you're dealing with possible deadlocks, race conditions, etc... Read about side effects (http://en.wikipedia.org/wiki/Side_effect_(computer_science)) and how they are what makes parallel programming so hard. Learn about functional (http://en.wikipedia.org/wiki/Functional_programming) languages like Erlang (http://en.wikipedia.org/wiki/Erlang_(programming_language)) and why they are great for writing parallel code (hint, next to no side effects).
Trivial examples are simple to thread. Stuff like take a list of numbers and add 1 to them sounds great, but that's not what's going on in the typical program.
nkawtg72
Sep 16, 2009, 11:54 PM
okay, can someone please explain to me, how at the time i wrote this post, there are 97 Positive and 13 Negative opinions for this thread?
i guess i just cant understand the logic of some people and why they would find anything about this news Negative? what could someone possibly find thats negative about performance increases as a result of new technology implementations?
i'm willing to keep an open mind here, i'm not trying to be combative. i'm just trying to understand what angle these Negative opinions are coming from.
have a good day!!!
eastcoastsurfer
Sep 16, 2009, 11:55 PM
The whole point is that you can setup a potentially lengthy (yet very tiny code wise) operation away from the main thread without having to have learn how to properly multithread an application.
All GCD has done is make it easier to start a thread (not that it was very hard before) and given a 'global' pool of thread execution units that managed by the system (this is the real win). I don't see where it has removed the need for developers to learn how properly manage a multi-threaded application. Multi-threaded applications require the programmer to think about and manage the order of operations and the resources they may use. Apple has changed the syntax to locks and semaphores, but the logical process used by the programmer is still the same. They can't simply dump all their long running code into separate threads without understanding and planning for all the interdependencies between various threads.
At the end of the day that is what makes multi-threaded programming hard, not the syntax of creating a thread or locking a resource (in Java and C# it is already dead simple to create threads).
wizard
Sep 16, 2009, 11:57 PM
This seems a lot like OpenMP for Objective-C. Which is a good thing.
I prefer to see it as something new and different but yes the similarities are there.
But the other commenters are right; GCD is not the big gain the Steve Jobs Reality Distortion Field wants you to think it is. It is only an incremental improvement. The big gains were made years ago when developers implemented multi-threading. (If they did.)
This I simply disagree with, GCD has the potential to be huge. Especially with OpenCL along. Also I don't subscribe to the idea that everything has aready been multi threaded or has threading that can't be improved. In fact I can see a whole new generation of software coming that this tech enables.
I think something important is being left out of these Fantastical Exampicals. In the real world, you need to know when your little parallel block task is done. That means, you are waiting for some notification (very thread like), or you are stopping your execution until the entire GCD 'block' is done. Because of this, the GCD gains will be few and far between.
I'd suggest reading Apples documentation. The benifits are based on what the developer can exploit out of the algorithms being used. There is likely to be a lot of exist code that will see little gain but more importantly a lot of code that couldn't run on a desktop PC before will be possible.
The important thing is to look to the future not the past.
And will probably still manage to bring in bugs for developers who actually believe that GCD makes things 'stupid easy.'
Bugs are bugs and so are stupid programmers but that is not what we are concerned with here. What I'm excited about is this tech enabling a whole generation of software from the smarter minds out there. Maybe you are a democrate but frankly I don't really give a damn about the flunkies in this world. This tech is for a new generation of creaters.
I would bet most of the gain in the example app in this article is from OpenCL, which actually brings new computing resources, rather than GCD, which gives you the same old crap in a new box.
See the above about reading the documentation.
This is probably from improvements in the kernel. OSX has traditionally had crappy thread performance, and perhaps that is now fixed.
That is the whole point GCD and the new infrastructure to support it are making things snappy for old software. It appears at this time that NSOperation and it's allied calls now sit on top of GCD. I have found an explicit statement to that effect but it does explain why some software runs much better. SL truely represents a significant overhaul of Apples OS and frankly I think many misunderstand it's significance.
Dave
Eidorian
Sep 17, 2009, 12:02 AM
okay, can someone please explain to me, how at the time i wrote this post, there are 97 Positive and 13 Negative opinions for this thread?
i guess i just cant understand the logic of some people and why they would find anything about this news Negative? what could someone possibly find thats negative about performance increases as a result of new technology implementations?
i'm willing to keep an open mind here, i'm not trying to be combative. i'm just trying to understand what angle these Negative opinions are coming from.
have a good day!!!This seems to be brought up more often lately for Page 1 articles. The votes don't matter.
Falcor Derkins
Sep 17, 2009, 12:05 AM
That's really underselling GCD.
The whole point is that you can setup a potentially lengthy (yet very tiny code wise) operation away from the main thread without having to have learn how to properly multithread an application. GCD lets you fire off a simple operation to run in parallel with the main thread (and not block it during a lengthy operation) using, effectively, two lines of code.
The automatic thread pool management stuff is just gravy.
that's the complete opposite of what I was entailing. what you describe is simplifying the implementation of multithreading (really you still need to know how it's working and basic synchronization ideas. Many operations will still require barriers/etc. to make sure that one thing has finished before something else can run, etc.). My point was that multithreading and hence GCD can benefit many applications as most GUI applications use threading significantly in response to someone saying it is useless for most applications, not to say that GCD is just a pthreads implementation or something along that line.
ikir
Sep 17, 2009, 02:47 AM
My MacPro is one of those that can not boot to the 64 bit Kernal... but still runs 64 bit apps.. like LightRoom. Does this mean it can not take advantage of these advancements?:confused:
Forget the damn 64bit kernel. You can run 64bit app, and especially your machine is OpenCL capable.
ikir
Sep 17, 2009, 02:50 AM
okay, can someone please explain to me, how at the time i wrote this post, there are 97 Positive and 13 Negative opinions for this thread?
i guess i just cant understand the logic of some people and why they would find anything about this news Negative? what could someone possibly find thats negative about performance increases as a result of new technology implementations?
i'm willing to keep an open mind here, i'm not trying to be combative. i'm just trying to understand what angle these Negative opinions are coming from.
have a good day!!!
I can explain it to you: people are just stupid. Or also someone who don't like Macs and vote it down... indeed news votes are useless.
flooce
Sep 17, 2009, 04:00 AM
The tricky thing is that the new technology underling Snow Leopard is not going to make a difference straight away, but needs to be implemented by developers. Those who read the ArsTechnica review of Snow Leopard (http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars) might believe the reviewer that Grand Central technology is relatively easy to implement into the code, because rather than creating threads on your own one can basically command to "handle things over to GCD" and it will do the rest of balancing the CPU-Power between processes and cores and all this tech things.
Well anyhow, it doesn't matter too much now, we will see benefits here over time though, be patient.
TheMacPotato
Sep 17, 2009, 04:20 AM
Hd2600 :(
thewinelake
Sep 17, 2009, 04:35 AM
I'd like to see figures for the independent contributions of:
- SnowLeopard running non-optimised code
- SL with OpenCL
- SL with GCD
- SL with both optimisations
Wonder if there's more detailed information anywhere?
SimonTheSoundMa
Sep 17, 2009, 05:47 AM
I've never seen an answer to this type of question, not even a ballpark figure....
Has anyone read what kind of speed up one might expect for h.264 video ENcoding using OpenCL on a MacPro that was maxed out with the standard-class video cards Apple offers?
just broad speedup. 10x? 100x?
Or is h.264 encoding not parallelizable enough to actually see much of a boost?
CUDA alternatives I have tried on my MacBook Pro with 2.4GHz/8600M GT go from 20fps to 150fps for 720p.
http://badaboomit.com/ is what I use. They claim 20x.
slackpacker
Sep 17, 2009, 06:35 AM
So the real question what version is this and where can I get it.
Also RIGHT HERE and now we have to start a list of Snow Leopard apps that support - Open-CL and Grand Central. I have said it all along that these 2 features in SL are the most important about performance... I hold Apple to task and ridicule for not releasing software that uses their own technology.... come on when major parts of the OS don't use it and FCS don't use it.... for shame Apple for Shame. Face Palm.....
djgamble
Sep 17, 2009, 06:48 AM
What's GCD like on dual-core Macs that can't use OpenCL?
I object to Apple not making OpenCL drivers for these machines because I think they would benefit the MOST.
Theoretical situation...
8-Core Mac Pro... video encoding... 4 seconds (Leopard)
8-Core Mac Pro... video encoding... 2 seconds (Snow Leopard)
2-Core iMac/MacBook Pro... video encoding... 10 minutes... (leopard)
2-Core iMac/MacBook Pro... video encoding... 10 minutes... (Snow Leopard)
2-Core iMac/MacBook Pro... video encoding... 5 minutes... (Snow Leopard if they made a damn driver!)
We're talking 1.5-2 year old computers here, and since Apple's ripped out ANY support for PPC machines, one would think they could support the last 4 years of Intel machines... one would think?
Here we're also talking about users who potentially have less money, and can't afford to upgrade their hardware as much... so do so every 5 years or so. These guys still pay to upgrade the software, and are interested in new technology... they also want the biggest bang for their buck.
So Mac Pro users... 4 seconds or 2 seconds... who cares? Me... 10 minutes or 5 minutes... I care because it's a lot of time! And for those who say "you're a cheap bastard..." my MBP cost significantly more than most Mac Pro's!
stylewriter
Sep 17, 2009, 07:46 AM
Let's be clear here, Grand Central Dispatch does not bring any performance improvements. It is just a library to simplify threading for some who might not otherwise do multi-threaded programming. It's the multi-threading that brings the performance improvements.
Grand Central does have some features you can't just implement inside an application in Leopard. Grand Central manages a threadpool system-wide, which allows threadpools to be used very cheaply(CPU cost wise) and allows the system to maintain an optimal number of worker threads for the CPU cores available.
The overhead of creating threadpools for large applications is minimal, but the cheap threadpools for smaller applications, or applications that may not need a threadpool often is substantial. This article is for .Net threadpools, but figure2 at http://msdn.microsoft.com/en-us/magazine/dd252943.aspx illustrates the problem of maintaining an optimal number of concurrent threads. Too few threads/core and performance plummets. Too many threads/core and performance starts heading south again. I believe OSX is the first OS to integrate global threadpools.
If you are running only one large application, then Grand Central might not give you an advantage. Grand Central is pretty neat in how it automatically allocates resources without overwhelming the computer.
stylewriter
Sep 17, 2009, 08:18 AM
What's GCD like on dual-core Macs that can't use OpenCL?
I object to Apple not making OpenCL drivers for these machines because I think they would benefit the MOST.
Theoretical situation...
8-Core Mac Pro... video encoding... 4 seconds (Leopard)
8-Core Mac Pro... video encoding... 2 seconds (Snow Leopard)
2-Core iMac/MacBook Pro... video encoding... 10 minutes... (leopard)
2-Core iMac/MacBook Pro... video encoding... 10 minutes... (Snow Leopard)
2-Core iMac/MacBook Pro... video encoding... 5 minutes... (Snow Leopard if they made a damn driver!)
We're talking 1.5-2 year old computers here, and since Apple's ripped out ANY support for PPC machines, one would think they could support the last 4 years of Intel machines... one would think?
Here we're also talking about users who potentially have less money, and can't afford to upgrade their hardware as much... so do so every 5 years or so. These guys still pay to upgrade the software, and are interested in new technology... they also want the biggest bang for their buck.
So Mac Pro users... 4 seconds or 2 seconds... who cares? Me... 10 minutes or 5 minutes... I care because it's a lot of time! And for those who say "you're a cheap bastard..." my MBP cost significantly more than most Mac Pro's!
These cards were sold to you for rendering video... they just aren't capable of doing the calculations. It was normal for the engine of a graphics card to not support double precision floating point, or bastardize IEEE754 floating point numbers, they were made to render graphics fast. If you wanted to be able to be able to do math calculations on your card, then you should have made sure to get a CUDA capable graphics card.
There isn't anything anyone can do... as far as I see, there are valid reasons for every unsupported graphics chipset; which is most of them.
pmjoe
Sep 17, 2009, 08:41 AM
Grand Central does have some features you can't just implement inside an application in Leopard. Grand Central manages a threadpool system-wide, which allows threadpools to be used very cheaply(CPU cost wise) and allows the system to maintain an optimal number of worker threads for the CPU cores available.
The overhead of creating threadpools for large applications is minimal, but the cheap threadpools for smaller applications, or applications that may not need a threadpool often is substantial.
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.
This article is for .Net threadpools, but figure2 at http://msdn.microsoft.com/en-us/magazine/dd252943.aspx illustrates the problem of maintaining an optimal number of concurrent threads. Too few threads/core and performance plummets. Too many threads/core and performance starts heading south again. I believe OSX is the first OS to integrate global threadpools.
The article you linked has almost nothing to do with global thread pools. Please state just ONE specific advantage global thread pools provide over one I'd create locally (for a desktop application).
If you are running only one large application, then Grand Central might not give you an advantage. Grand Central is pretty neat in how it automatically allocates resources without overwhelming the computer.
What are you calling a "resource" here? Cores? Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?
Rocketman
Sep 17, 2009, 08:42 AM
CUDA alternatives I have tried on my MacBook Pro with 2.4GHz/8600M GT go from 20fps to 150fps for 720p.
http://badaboomit.com/ is what I use. They claim 20x.
To me I am not interested in what somebody can do on their late model 8 core MacPro unless they already have a room full of 10 machines and the software lets ten guys get to the pub an hour earlier every day.
I am far more interested in what it does for the lowest hardware that can take advantage of its full benefits such as this MacPro with dual graphics processors. Hopefully it lets relatively low level hardware simply do things not possible before.
Heck, the iPod Touch 2009 32/64 has OpenGL now and enough heat budget to cripple the chip speed 50% less.
What interests me most about these technologies is seeing them hit the lowest of the low. iPhone, MacBook polycarbonate, MacMini, and whatever tablet is coming out, but that will have dual core. :)
Rocketman
After G
Sep 17, 2009, 08:58 AM
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.If your UI stalls waiting for a task to complete, you can run the task separately while the UI keeps chugging along leaving the user free to do other tasks in the UI while waiting. For example, when a tab stalls in Safari, wouldn't it be great to switch to another tab and get some work done if one of them stalled? Also you are not creating the thread pool, GCD takes care of that. You just specify how tasks are split (arguably the hard part of multi-core programming) and how they interact and GCD manages it all. Kinda like how you don't have to micromanage all the trains at the train station when you have the workers there to do it for you.
The article you linked has almost nothing to do with global thread pools. Please state just ONE specific advantage global thread pools provide over one I'd create locally.Global thread pools (as used by GCD) know the state of the system. This is stated as a specific example of the advantage of GCD - it will optimize the number of threads to reflect system load.
What are you calling a "resource" here? Cores? Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?Yes. Actually it's not asking you to think that low level. GCD says, "Give me a task that can be broken up into blocks, and I'll figure out the best way to do it given current system resources." Namely CPU and GPU. Memory is already managed in a preemptive multitasking system. Why rewrite the memory manager when you have a good one already?
At least that's how I understood it.
Cydonia
Sep 17, 2009, 09:00 AM
@2002cbr600f4i
Thank you for that. I know nothing about programming or these technologies but that has helped me understand why this is important. ;)
AAPLaday
Sep 17, 2009, 09:01 AM
Handbrake and isquint optimisation please :D
coleridge78
Sep 17, 2009, 09:09 AM
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.
The article you linked has almost nothing to do with global thread pools. Please state just ONE specific advantage global thread pools provide over one I'd create locally (for a desktop application).
What are you calling a "resource" here? Cores? Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?
You like to bluster a lot, but you apparently don't read about what you're talking about before you open your yap.
Yes, GCD is that smart. That's the entire point of the exercise--effectively co-ordinating the dozen or more user-space programs that are generally running at any given time to most efficiently use threads between them, so that each is maximized without stepping on others.
A side effect of this global subsystem to handle the thread dispatch and management (whoever said they didn't see any mgmt in GCD also apparently is totally ignorant of it) is that the prosaics of threads are now extremely simple, which is a win inasmuch as it's an encouragement to developers to thread tasks which lend themselves to threading but were maybe not worth the human overhead in the past.
People can say "that was already the easy part!" but it doesn't matter if it was easy--it was time-consuming, detail-intensive, and boring, which means it only happened when there was a very clear win.
Or rather, it's trying to be that smart--how effective it is, broadly, remains to be seen.
eastcoastsurfer
Sep 17, 2009, 09:35 AM
Yes. Actually it's not asking you to think that low level. GCD says, "Give me a task that can be broken up into blocks, and I'll figure out the best way to do it given current system resources." Namely CPU and GPU.
Not quite. It is still up to the programmer to determine what, if any tasks can be broken into blocks and run in parallel. They have to determine what is shared data, what order of ops matter, dependencies, etc... This is what is hard about parallel programming (really, creating threads has been simple since Java/C# what you do with them is the hard part). Also, many programs, especially GUI ones have few operations that can run in parallel. Of course there are exceptions, but the big win for GCD will come from applications that are most likely already written to run in parallel being able to fully utilize any machine they are on because of the global pool/queue/dispatch whatever they want to call it.
eastcoastsurfer
Sep 17, 2009, 09:42 AM
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.
The article you linked has almost nothing to do with global thread pools. Please state just ONE specific advantage global thread pools provide over one I'd create locally (for a desktop application).
What are you calling a "resource" here? Cores? Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?
Actually you're ripping on THE big feature of GCD. I remember when I first started using Linux back in the 90s. When you compiled your own kernel you could pass an option to GCC to spawn X# threads. IIRC, the ideal # of threads was the number of procs + 1 IF your machine was just going to compile. If it was going other stuff at the same time maybe you knock it down to a single thread.
I give that example to show why the global queues are a good thing. Programs that need threading can now ask for and use as many queues as they want and GCD will dynamically up and lower the actual number of threads based on cores and load. The programmer doesn't have to guess and the user doesn't have to set anything.
That is the cool part about GCD. It will not suddenly make all programs more responsive or make everything threadable or make every programmer able to code parallel code. What it will do is allow programs that have potential parallel sections fully utilize the hardware they are on.
stylewriter
Sep 17, 2009, 10:17 AM
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.
How about 'ls'? Anything that can be split up into concurrent processes. Regular threadpools are too expensive to setup to use in most situations.
The article you linked has almost nothing to do with global thread pools.
The number of concurrent threads is... a global problem for a system. The chart shows the number of concurrent threads on a system vs the amount of work performed. Having multiple local threadpools that don't know about each other means that you have a LOT more than the optimal number of threads concurrently running.
Please state just ONE specific advantage global thread pools provide over one I'd create locally (for a desktop application).
I listed two... cheap threadpools and allowing the system to maintain an optimal number of worker threads. If you need a specific usage scenario; How about using a threadpool for video encoding while running a separate app for video editing(that also uses threadpools). If both apps had local threadpools that were optimized for the number of CPUs on the system, then you are just wasting resources.
What are you calling a "resource" here? Cores?
Cores and memory. Extra threads underutilizes Cores by forcing more context switching and eat up more memory.
Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?
Yes... how could it not? GCD manages the threadpool for all GCD apps.
haravikk
Sep 17, 2009, 10:21 AM
I'll be polite, but this analysis is sorely lacking in understanding of threading mechanisms....
"Idle" threads don't "lose time".
I'm a bloody programmer; I didn't say "idle" threads, I said threads that barely do anything. There's a difference. The cost of a context switch just to perform a trivial operation can be huge, and in user interfaces, and many other common cases you can get a lot of these types of operations.
Threads handling network connections that aren't up to much can likewise add to the thread-"clutter" in a system, which results in crazy amounts of context-switching just to give threads a chance to run when they're not really needed. If instead a single thread were running some of these light-weight tasks, then there would be almost no cost to them, but when applications have threads dedicated to these tasks then processor time can be lost to thread-management as soon as you start opening a handful of internet connected programs or other lightweight apps.
Shiner
Sep 17, 2009, 11:18 AM
CUDA alternatives I have tried on my MacBook Pro with 2.4GHz/8600M GT go from 20fps to 150fps for 720p.
http://badaboomit.com/ is what I use. They claim 20x.
I use bababoom for my iphone encodes. It is fast but the picture is horrible!! The CUDA cards are nice but the end product is not as good. My wife barely notices the difference from 480 to 720p on our 50" screen. She made me turn off the movie I encoded with badaboom to 720p. It looks boxee as hell!! Don't even try 1080p with that crap.
2002cbr600f4i
Sep 17, 2009, 11:21 AM
I have to disagree as I believe Apple refactored NSOperation and related stuff to run on top of the new threading architecture. It is the only viable explanation I have for some of the speed ups seen in older code. We maybe saying the samething but I'm specificaly saying that the infrastructure in place in SL to support GCD has had an excellent impact on existing software. Well existing software that took advantage of Apples NS threading primitives.
Well yah a rewrite can always speed things up, I'm just saying there has been a positive impact on existing software. That has a lot to do with the infrastructure put in place for GCD.
Dave
Yup Dave, we're saying the same thing... It's not that the app has gotten faster just because of GCD. It's that the OS support that the App makes use of has gotten faster because it's been rewritten to use GCD. So, yes, SL can run some things faster without apps having to be rewritten. There is SOME benefit. We won't see the rest of the benefits until the apps are modified to directly use GCD themselves.
Mr. Wonderful
Sep 17, 2009, 11:35 AM
Handbrake and Creative Suite rewrites to take advantage of these technologies are the biggest items on my wishlist.
wizard
Sep 17, 2009, 11:49 AM
The tricky thing is that the new technology underling Snow Leopard is not going to make a difference straight away, but needs to be implemented by developers.
I will have to continue to object to this idea, well written apps already benefit from SL from what I can see. In some cases I would seriously doubt the developers will do anything more to optimize specifically for SL.
Obviously there are programs that can be re-engineered to benefit from GCD, blocks and the low level features of the platform. How widely this is the case really isn't well known at the moment.
Those who read the ArsTechnica review of Snow Leopard (http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars) might believe the reviewer that Grand Central technology is relatively easy to implement into the code, because rather than creating threads on your own one can basically command to "handle things over to GCD" and it will do the rest of balancing the CPU-Power between processes and cores and all this tech things.
Well it's been awhile since reading that article but I don't remember that being said. Yes GCD does the load balancing but it is still up to the programmer to find an optimal way to parallize the algorithms being used. So while the details of handling micro threads is not an issue there is no change in the efforts required to find the parallel code.
Well anyhow, it doesn't matter too much now, we will see benefits here over time though, be patient.
Well this is certainly true! It be iteresting to see what apps adopt GCD heavily and more so which apps go one step further and adopt OpenCL.
Dave
Amdahl
Sep 17, 2009, 12:25 PM
We're talking 1.5-2 year old computers here, and since Apple's ripped out ANY support for PPC machines, one would think they could support the last 4 years of Intel machines... one would think?I guess you're making the mistake in thinking they dropped PPC to save time. No, they dropped PPC to screw customers. And that's what they are doing to you, too. This won't stop until customers stop excusing Apple for this kind of behavior. Of course, they dropped 'Computer' from their name because they make toys now.
Here we're also talking about users who potentially have less money, and can't afford to upgrade their hardware as much... so do so every 5 years or so. These guys still pay to upgrade the software, and are interested in new technology... they also want the biggest bang for their buck.That makes you a poor customer. :D It used to be you bought a Mac, you got long life out of it. Now, you buy a Mac. And then you buy another one within two years, or you are a worthless Apple person.
So Mac Pro users... 4 seconds or 2 seconds... who cares? Me... 10 minutes or 5 minutes... I care because it's a lot of time! And for those who say "you're a cheap bastard..." my MBP cost significantly more than most Mac Pro's!Next time, don't give Apple your money unless they promise you a certain number of years of support.
wizard
Sep 17, 2009, 12:28 PM
Yup Dave, we're saying the same thing... It's not that the app has gotten faster just because of GCD. It's that the OS support that the App makes use of has gotten faster because it's been rewritten to use GCD. So, yes, SL can run some things faster without apps having to be rewritten.
This really appears to be a very good thing for apps making use of Apples higher level threading APIs. For people programming with those APIs there may be little incentive to go to the low level of the GCD primitives.
There is SOME benefit. We won't see the rest of the benefits until the apps are modified to directly use GCD themselves.
If they are modified. The thing is if your app is 20 to 50% faster using the high level Cocao threading features then maybe the developer will spend his time on other parts of the app.
What it comes down to is the incentive always there for a developer to drop down to the low level nitty gritty of GCD. Sometimes the answer will be no simply because Apple has improved the high level routines so the developer is freed up to work on other parts of the program.
I realize that many programs will need to be recorded for GCD and that we will see impressive results. That is a given for programs that can be parallized heavily. On the otherhand cooling the idea that some have in this thread that all refactored programs will suddenly run much faster than todays version is in order. Here I'm not talking so much to you as to the individuals that seem to believe that a year from now we will all be shocked by how fast some apps run. For some apps you won't see much more than what we currently see in speed increases.
Dave
2002cbr600f4i
Sep 17, 2009, 12:30 PM
I'd like to see figures for the independent contributions of:
- SnowLeopard running non-optimised code
- SL with OpenCL
- SL with GCD
- SL with both optimisations
Wonder if there's more detailed information anywhere?
Well, I think that would make for a GREAT test/example. The problem is coming up with some sort of algorithm/test problem that could be applied across like that....
Something like:
1) Perform operation across huge array of data, linearly.
2) Do #1, but do it by making your own threads to do it in parallel
3) Do it again, but using the GCD features instead of your own threads
4) do it again, but using OpenCL
5) do it again, but use some sort of combination of 3+4.
Note, OpenCL will run any code sent to it on your CPU if there isn't a capable video card in your system. And when it does so, the CPU's OpenCL driver uses blocks and GCD's queues to do the processing. If the code is getting executed on the GPU, it uses whatever facilities the GPU hardware + driver uses.
If anyone can come up with an interesting / compelling test like that I'd love to take a crack at coding up 1,3,4 and 5 (I've never done manual threads in Objective-C on OSX so I have no idea how to write #2.)
2002cbr600f4i
Sep 17, 2009, 12:42 PM
On the otherhand cooling the idea that some have in this thread that all refactored programs will suddenly run much faster than todays version is in order. Here I'm not talking so much to you as to the individuals that seem to believe that a year from now we will all be shocked by how fast some apps run. For some apps you won't see much more than what we currently see in speed increases.
Dave
Agreed! Nothing has annoyed me more since SL's release than seeing all these people on the boards scream that "SL didn't make all my programs run faster! Why isn't OpenCL doing anything on my system? SL is nothing more than a Service Pack!" etc.
I chalk that up to, they just don't know how the underlying stuff really works. The changes they made to SL by adding GCD + OpenCL has the POTENTIAL to make the system faster (if for no reason other than some of the underlying OS code being able to use them to speed up some things, like CoreImage getting a 25% boost due to it using OpenCL now), but it's not a magic bullet where installing SL suddenly makes everythign go 50% faster like some people think.
No, these changes introduced in SL provide DEVELOPERS with tools that can help make their applications, where applicable, faster, and by using those features, in turn make the overall system more responsive and better able to make use of all the hardware available (rather than having 1/2 your cores sitting around doing nothing while the other 1/2 are maxed out.) Again, that's an ideal situation, but I think as we start to see more apps converted to use these features, we'll see utilization of the hardware improve, especially when running multiple applications simultaneously, and the OS will remain responsive even in those situations.
Yes, there's nothing here (at least with GCD) that good developers couldn't already do by creating their own threads in their programs. These tools just make it easier to do that so it's not as much a headache to go ahead and use a multithreaded approach when possible rather than saying "I dont' want to deal with writing this multithreaded even though this is a good place to do it!" So, yeah, existing well written multithreaded programs probably won't see much of a boost from GCD. But if the tools help other developers make their programs more multithreaded, then it shouldn't hurt at all.
Amdahl
Sep 17, 2009, 12:44 PM
I'm a bloody programmer;
But have you got the GCD choo-choo stamped on your hand yet?
Uh oh, got to go! There's a train coming in, and I've got a thread-pool to catch.. To Infinity!
It's not that the app has gotten faster just because of GCD. It's that the OS support that the App makes use of has gotten faster because it's been rewritten to use GCD. So, yes, SL can run some things faster without apps having to be rewritten. There is SOME benefit.Some people wouldn't call this GCD(People other than marketing). Some people would call this fixing your longstanding crappy kernel.
I will have to continue to object to this idea, well written apps already benefit from SL from what I can see. In some cases I would seriously doubt the developers will do anything more to optimize specifically for SL. Yep, it is just a marketing invention. OpenCL is the only meat on this carcass.
Well this is certainly true! It be iteresting to see what apps adopt GCD heavily and more so which apps go one step further and adopt OpenCL. You'll hardly be able to tell. Any app that would have seriously benefited, would already be doing multi-threading. GCD is just the latest Apple gimmick, selling people multi-threading for the 6th or 7th time. Just like 10.6 is selling 64-bit for the 4th or 5th time. The GCD technology is so inconsequential that the Windows equivalent, ConCRT, is hardly even talked about.
If Apple really felt GCD was important, and really wanted it adopted by as many developers as possible, they would have backported the Objective-C 2.1 runtime to 10.5, and possibly even 10.4. Then app developers would have no reason NOT to use it. Currently, if you want to write apps that can be used by the entire Mac community and have hopes for portability to Win or X, you don't touch GCD.
wizard
Sep 17, 2009, 01:01 PM
I'd like to see figures for the independent contributions of:
- SnowLeopard running non-optimised code
I'm not sure this would be all that valid because SL so improves threading support that many apps are running on top of brand new code. So ineffectual old programs are in many ways running optimized code from libraries and are being served via improved threading management.
- SL with OpenCL
There are significant limitations on what OpenCL can accelerate. Unless you have a specific need there is little reason to focus on OpenCL. Plus the numbers that you get are highly dependant on the actual GPU in the system.
- SL with GCD
SL really doesn't come without GCD. In fact libdispatch and the kernel work together very tightly. From what I can see the major reason to get SL out the door was to get GCD out in the wild, GCD is the whole point of the SL release.
- SL with both optimisations
Both GCD and OpenCL ship with SL. About the only thing you could do would be to test hardware with and without OpenCL hardware installed. The thing is certain parts of SL, such as Core Image do make use of OpenCL and compatible GPUs.
Wonder if there's more detailed information anywhere?
Well not exactly what you are asking for but there is some interesting bench marking out there. Some apps are showing impressive speed ups from simply running under SL. There are also regressions so Apple has work to do.
While I don't benchmark I've noticed that some apps are indeed much snappier under SL. There is likely several explanations here so I'm not even going to try to explain what may be happening. I like to describe it as getting a new faster machine for all of $30.
Dave
PS
There are glitches too with SL but you really can't focus to much on them. I expect they will be fixed in time. It is more important in my mind that they have fixed numerous issues with Leopard that gave nothing to do with GCD or OpenCL, Snow Leopard is a big win for Apple.
wizard
Sep 17, 2009, 01:35 PM
But have you got the GCD choo-choo stamped on your hand yet?
Uh oh, got to go! There's a train coming in, and I've got a thread-pool to catch.. To Infinity!
Like it or not for some people catching that train will be real important to staying in business.
Some people wouldn't call this GCD(People other than marketing). Some people would call this fixing your longstanding crappy kernel.
And some people are totally ignorant of what is being discussed here. When GCD gets up and running on Linux are you going to accuse them of having a crappy kernel?
Yep, it is just a marketing invention. OpenCL is the only meat on this carcass.
A little cranky today. OpenCL is pretty impressive and currently is the only platform independant way, that has a chance at wide scale acceptance, to harvest the power of GPUs for non graphics work.
You'll hardly be able to tell. Any app that would have seriously benefited, would already be doing multi-threading.
I hear this all the time and each time I hear it, it is still wrong. It all depends upon the application and the programmers ability to extract parallel operation out of it.
GCD is just the latest Apple gimmick, selling people multi-threading for the 6th or 7th time. Just like 10.6 is selling 64-bit for the 4th or 5th time. The GCD technology is so inconsequential that the Windows equivalent, ConCRT, is hardly even talked about.
It is something for developers to discuss, so obviously you would not hear about it. However it is of far more interest to Apple developers due to the types of software found on Apples platforms.
If Apple really felt GCD was important, and really wanted it adopted by as many developers as possible, they would have backported the Objective-C 2.1 runtime to 10.5, and possibly even 10.4. Then app developers would have no reason NOT to use it. Currently, if you want to write apps that can be used by the entire Mac community and have hopes for portability to Win or X, you don't touch GCD.
Let's face it anybody running 10.4 isn't going to be taking on modern software and thus none of this means anything to them. As to 10.5 why do you think the SL update is so cheap?
Besides if you back ported all this what would you have? Snow Leopard obviously. Your suggestion like the majority of your posts has no merit.
Dave
Amdahl
Sep 17, 2009, 02:46 PM
Like it or not for some people catching that train will be real important to staying in business. You mean the gimmick business? I concede that point.
And some people are totally ignorant of what is being discussed here. When GCD gets up and running on Linux are you going to accuse them of having a crappy kernel?Linux (nor Windows) doesn't need to have thread management totally revamped in order to accommodate a high thread environment.
A little cranky today.Not really.. It just seems that way because I don't work for Apple's PR dept.
I hear this all the time and each time I hear it, it is still wrong. It all depends upon the application and the programmers ability to extract parallel operation out of it. Pardon a pun, but this is a train that never seems to get to the station. Programs that benefit from threads, like Handbrake, already use threading. They won't gain from GCD, except to the degree that they were suffering from OSX's bad kernel scheduler. This scheduler has been showing it's warts to some degree since the Quad G5, and greatly since the OctoPro.
It is something for developers to discuss, so obviously you would not hear about it. However it is of far more interest to Apple developers due to the types of software found on Apples platforms. It seems to be of more interest to disciples than developers.
Let's face it anybody running 10.4 isn't going to be taking on modern software and thus none of this means anything to them. As to 10.5 why do you think the SL update is so cheap? 10.4 was what you bought on a Mac less than two years ago, say, an 8-core Mac Pro. That isn't 'modern?' SL is cheap because Apple needs people to update to keep their upgrade treadmill alive. If people stop updating, they can't keep obsoleting older OSes, and older systems, and then the HARDWARE sales machine breaks down. SL is cheap because they have got to find a way to kill the value of previous Macs.
Besides if you back ported all this what would you have? Snow Leopard obviously.Some people would call that a service pack.
You'd also have 100% deployment of GCD, which if it was really so important, Apple would have done so, so that developers could TRULY use it without reservation.
Your suggestion like the majority of your posts has no merit. Maybe we should box those posts up and sell them for $29.
2002cbr600f4i
Sep 17, 2009, 02:55 PM
You'd also have 100% deployment of GCD, which if it was really so important, Apple would have done so, so that developers could TRULY use it without reservation.
Maybe we should box those posts up and sell them for $29.
Just like how MS said that DirectX 10 was so important... So important that they didn't bother to backport it to Windows XP in order to maximize it's penetration...
As a result, most games still are written in DX9, with some having DX10 support. But DX10 has largely been a huge failure because MS tried to use it as a mechanism to force people to Vista, without demonstrating why DX10 was so superior to DX9...
Now, some would argue the same thing is true with Apple and GCD and Leopard vs. Snow Leopard....
stylewriter
Sep 17, 2009, 03:41 PM
Just like how MS said that DirectX 10 was so important... So important that they didn't bother to backport it to Windows XP in order to maximize it's penetration...
As a result, most games still are written in DX9, with some having DX10 support. But DX10 has largely been a huge failure because MS tried to use it as a mechanism to force people to Vista, without demonstrating why DX10 was so superior to DX9...
Now, some would argue the same thing is true with Apple and GCD and Leopard vs. Snow Leopard....
True, it would be nice if Apple backported basic GCD style thread pooling to 10.5/10.4. Even if it had to create local thread-pools for every application and created no speed advantage on those platforms. Porting the GCD style syntax(again with local threadpools) to Windows and Linux versions of GCC would be even better.
I wouldn't doubt GCD gets ported to more architectures and OS versions.
eastcoastsurfer
Sep 17, 2009, 03:52 PM
Like it or not for some people catching that train will be real important to staying in business.
I'm sorry. If you needed your application to be multi-threaded to stay in business it would have already been done.
And some people are totally ignorant of what is being discussed here. When GCD gets up and running on Linux are you going to accuse them of having a crappy kernel?
Process scheduling is HARD. Linux has wrestled with it for years to find a balance between being user/desktop responsive and server responsive. The scheduler in OSX is pretty poor though and hasn't seemed to get much better over OS releases.
I hear this all the time and each time I hear it, it is still wrong. It all depends upon the application and the programmers ability to extract parallel operation out of it.
And here's the kicker, most applications have no parallel optimizations available. Server applications benefit, batch processing apps benefit, but the typical GUI app is not constrained by the machine, but by how fast the user can process and respond to the information being displayed. If run of the mill GUI apps suddenly get a boost it's because Apple is using GCD as a proxy to finally fix their scheduler. Other OSs schedulers have already managed this (splitting process on unused cores) for years though.
It is something for developers to discuss, so obviously you would not hear about it. However it is of far more interest to Apple developers due to the types of software found on Apples platforms.
Do tell what types of software is on Apples platforms that are not on others? I would argue that other platforms have more use for GCD or something similar than Apple does, simply because both Linux and Windows have a much larger market share in the server arena. Servers is where you could really see something like GCD shine. That's when you get into 16+ core machines.
Years ago I wrote a multi-threaded server process that I spent a lot of time optimizing the thread usage for a dual proc machine (best at the time). When we upgraded to quad procs, I had to re-optimize. GCD hopefully removes the optimization part and that is where I think there is a big win.
2002cbr600f4i
Sep 17, 2009, 04:05 PM
True, it would be nice if Apple backported basic GCD style thread pooling to 10.5/10.4. Even if it had to create local thread-pools for every application and created no speed advantage on those platforms. Porting the GCD style syntax(again with local threadpools) to Windows and Linux versions of GCC would be even better.
I wouldn't doubt GCD gets ported to more architectures and OS versions.
Actually, Apple HAS put in to have their concept of blocks get added to the C/C++/Obj-C standards...
They've also put out a version of GCD to be used by other OS'es.
LLVM, the compiler they're using, is also freely available IIRC.
So, everything you need to implement and use GCD +the coding facilities involved with it, IS available to be added to other platforms. It's up to those other platforms to decide if they want to use it or not.
cdcastillo
Sep 17, 2009, 05:46 PM
Shouldn't the original article say that the developer is optimizing his/her app for SL, and not the other way around?
Just curious.
:confused:
dasmb
Sep 17, 2009, 07:02 PM
You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.
This is patently untrue. Many, many common operations could be easily multithreaded if the cost -- in terms of development technique and setup/teardown of new threads -- was not so prohibitive. That's the problem GCD aims to solve.
Consider a typical function that performs some sort of IO, do a set of 5 or more independent operations (such as applying filters or validation), and then record the result of those operations. Ordinarily, we perform these three stage in serial, and once performance becomes an issue, we go back and re-implement one of the stages (usually the IO) in such a way that it populates a synchronized "queue" of results, which can be handled by another worker thread. Rarely do developers go more than three threads deep, because it's difficult to think about three things happening in parallel. GCD purports to change that by allowing developers to state parallelizable blocks in a way that's just as natural as a for loop. Assuming you had enough CPUs, you could launch all 5 independent operations at the same time on separate threads, and coordinate the result when they all completed. Of course, if you only have one CPU -- what is this, Russia? -- the code executes same as it did before.
Another example is event multicasting (the issuance of requests to handle an event to more than one registered recipient). Typically, multicasting is also performed in serial. However, it's rare that multicast clients are dependent on each others' results, in part because the order of dispatch is often nondeterministic, but mostly because the whole idea of event multicasting is to give the illusion to the user that many things are happening at the same time. It's quite possible that, in a cast where clicking a button should result in several UI mutations (such as the button being greyed, the cursor changing state) as well as one or more independent background operations. No developer is going to manage a thread stack to perform such trivial operations, but every developer will need to write a block of code to manage them. If the act of turning this block over to GCD is as simple as writing the block a little differently, and brokering a call to GCD, the barrier to multithreading is eliminated.
What about logging? Every application logs something somewhere, and most of them do it in serial too. I recently wrote a log adapter using plain old non GCD threads and it boosted our apps performance by about 20%.
As for OCL -- I've read papers claiming benefit from using the GPU to handle such basic tasks as string operations and node traversals. Any task where the number of operations exceeds the size of the data they're operating on can be pushed onto the GPU for a performance bump.
Most of the time, multithreading and GPU computing are ignored in an application's development because developers make the same assumption you do, that the average operations isn't likely to benefit from them. This is done because performing this kind of optimization when it isn't needed is costly and error prone -- "Do not preoptimize" is the motto of any developer whose tasks are more complex than Hello World. The beauty of OCL and GCD is that they allow developers to make relatively minor changes to the way they write that vastly improve performance before even the first pass of optimization occurs.
Amdahl
Sep 17, 2009, 07:45 PM
This is patently untrue. Many, many common operations could be easily multithreaded if the cost -- in terms of development technique and setup/teardown of new threads -- was not so prohibitive. That's the problem GCD aims to solve.
That particular problem was already solved... OpenMP. It has been in XCode with the GCC 4.2 compiler. GCD might be based on OpenMP, and that is why Apple is being so 'generous' and releasing their modifications as GCD.
The real problem (that GCD & OpenMP doesn't solve) is that these little chunks of so-called parallel code are either embarrassingly parallel, and thus already being threaded, or they are not, and the reason is that they are not that parallel. In fact, most operations DO depend on the data in other elements or in other steps in a series. Those that don't, often still need some kind of locking or synchronization to be done before data can be read or written back to the application's main data structures. The best case is that you can clone the data structure, do the work, and then substitute the new data structure for the old in one locking/store. Your overhead in that case is a read lock, the memory copy, and a write lock. That's the best case. Anything else is going to be lock intensive, unless you just want to freeze access to that data for all other threads for the duration of the process. That would mean your GUI can't display the data until it comes out of the lock.
Locking, synchronization, overhead, serial code. Amdahl's Law.
AidenShaw
Sep 17, 2009, 08:13 PM
Those that don't, often still need some kind of locking or synchronization to be done before data can be read or written back to the application's main data structures.
The best case is that you can clone the data structure, do the work, and then substitute the new data structure for the old in one locking/store. Your overhead in that case is a read lock, the memory copy, and a write lock.
If you can simply overwrite the main structure, why was a lock needed in the first place?
Perhaps a better example is where the main structure needs to be updated, not overwritten.
One good example of this is when the application has global counters (for performance data or other reasons).
Instead of locking/update/unlock every time something needs to be counted, one can have thread-local counters that are updated without synchronization. When the thread ends, the lock/update/unlock can be done once to the global counter.
ungraphic
Sep 17, 2009, 08:15 PM
Why doesnt OpenCL support the ATI HD3870? Its got the hardware, so where is Apple giving us software? I remember Steve Jobs once said 'but because we [Apple] believe in choice....'
Well, consumers believe in choice more so than you, Steve, and some of us paid good money for an OpenCL capable card.
djgamble
Sep 17, 2009, 08:31 PM
These cards were sold to you for rendering video... they just aren't capable of doing the calculations. It was normal for the engine of a graphics card to not support double precision floating point, or bastardize IEEE754 floating point numbers, they were made to render graphics fast. If you wanted to be able to be able to do math calculations on your card, then you should have made sure to get a CUDA capable graphics card.
There isn't anything anyone can do... as far as I see, there are valid reasons for every unsupported graphics chipset; which is most of them.
They are capable... I can tell you that much.
What do you think rendering video involves? MATHS!
Your card was designed for rendering video... but Apple decided to make a driver that allowed it to work like a backup CPU. They were just laaaazy and have a really bad relationship with ATI nowadays.
2002cbr600f4i
Sep 17, 2009, 08:33 PM
Why doesnt OpenCL support the ATI HD3870? Its got the hardware, so where is Apple giving us software? I remember Steve Jobs once said 'but because we [Apple] believe in choice....'
Well, consumers believe in choice more so than you, Steve, and some of us paid good money for an OpenCL capable card.
I'm sorry, but I call BS here. I'm betting you bought that card LONG before OpenCL was announced! And if you didn't, well, why did you buy that card wthout knowing if it would be supported???
After G
Sep 17, 2009, 08:46 PM
I guess you're making the mistake in thinking they dropped PPC to save time. No, they dropped PPC to screw customers. And that's what they are doing to you, too. This won't stop until customers stop excusing Apple for this kind of behavior. Of course, they dropped 'Computer' from their name because they make toys now.
That makes you a poor customer. :D It used to be you bought a Mac, you got long life out of it. Now, you buy a Mac. And then you buy another one within two years, or you are a worthless Apple person.
Next time, don't give Apple your money unless they promise you a certain number of years of support.Umm, my (early 2006) Mini runs Snow Leopard just fine ... it's over 3 years old now. Apple doesn't support the hardware (out of warranty, or even Applecare) but the software is current, so I don't see how I wouldn't be supported that way. With the current timetable on the OS, I can see running for another 2-3 years before the next iteration of OS X requires a dedicated GPU. That's 6 years on my computer - far more than a run-of-the-mill PC.
My eMac bought in 2002 was able to run everything up to Tiger, so easily 5 years of software support. (Tiger was succeeded by Leopard in 2007.) It's not a promise, but with technology moving as fast as it does, 5 years is an eternity.
djgamble
Sep 17, 2009, 08:47 PM
I'm sorry, but I call BS here. I'm betting you bought that card LONG before OpenCL was announced! And if you didn't, well, why did you buy that card wthout knowing if it would be supported???
OpenCL was announced some years ago, its only just hit a stable release! It was DEVELOPED using older cards...
Apple's just been MEGA lazy with the drivers!!!
We're talking 3.5 years of computers... they've chosen not 2 support 2 VERY POPULAR ATI cards (which basically cover everybody who doesn't have the latest round of MBP's or Mac Pro's.)
It's just a grudge match against ATI... Apple ALWAYS does this...
OS X OpenGL - didn't support the older ATI cards (OS 9 OpenGL did though? Apple got sued big time for that one because they'd sold a bunch of iMacs to a HUGE law firm... claiming they were OS X capable as they had G3's)
Quartz extreme - didn't support the Rage Pro...
Quartz extreme (when updated) - didn't support the GeForce 2
OpenCL - doesn't support computers more than 2 years old (although it was drafted out and announced before then)
---
It's Apple's tactic. Most Macs have built-in graphics cards and you CAN'T get these new features without buying a whole new computer.
Windows is different... you can generally upgrade your graphics card even if you have a cheap computer. And... Windows... tends to support pretty well every card (out of a MUCH larger pool that Apple's offering)... Apple's restricted OS X to the last 3 years of computers produced and STILL can't support every computer. It's a farce!
ungraphic
Sep 17, 2009, 10:29 PM
OpenCL was announced some years ago, its only just hit a stable release! It was DEVELOPED using older cards...
Apple's just been MEGA lazy with the drivers!!!
We're talking 3.5 years of computers... they've chosen not 2 support 2 VERY POPULAR ATI cards (which basically cover everybody who doesn't have the latest round of MBP's or Mac Pro's.)
It's just a grudge match against ATI... Apple ALWAYS does this...
OS X OpenGL - didn't support the older ATI cards (OS 9 OpenGL did though? Apple got sued big time for that one because they'd sold a bunch of iMacs to a HUGE law firm... claiming they were OS X capable as they had G3's)
Quartz extreme - didn't support the Rage Pro...
Quartz extreme (when updated) - didn't support the GeForce 2
OpenCL - doesn't support computers more than 2 years old (although it was drafted out and announced before then)
---
It's Apple's tactic. Most Macs have built-in graphics cards and you CAN'T get these new features without buying a whole new computer.
Windows is different... you can generally upgrade your graphics card even if you have a cheap computer. And... Windows... tends to support pretty well every card (out of a MUCH larger pool that Apple's offering)... Apple's restricted OS X to the last 3 years of computers produced and STILL can't support every computer. It's a farce!
Thank you.
To be on point, I'm really waiting for proper support of OpenCL for my 3870. Had I known Apple wouldn't care, I would have bought the Geforce 8800.
Amdahl
Sep 18, 2009, 12:03 AM
If you can simply overwrite the main structure, why was a lock needed in the first place?If there is no other thread in your code for changes(or reads) to the structure to initiate, then you don't. But if there is, then you need a read+write lock before the copy, to make sure the structure is in a consistent state. Release the read lock once copied, but leave the write. You need to read lock again before you make the substitution. You can drop some of those locks, and instead simply prohibit all access to the structure. But if you are doing that, the copy isn't necessary in the first place.
Perhaps a better example is where the main structure needs to be updated, not overwritten.Yes, but I'm trying to look at a best-case practical example of GCD. The point being, there aren't very many that are 'stupid easy' enough to get excited about.
Instead of locking/update/unlock every time something needs to be counted, one can have thread-local counters that are updated without synchronization. When the thread ends, the lock/update/unlock can be done once to the global counter.Yep, excellent example of avoiding serialization.
My eMac bought in 2002 was able to run everything up to Tiger, so easily 5 years of software support. (Tiger was succeeded by Leopard in 2007.) It's not a promise, but with technology moving as fast as it does, 5 years is an eternity.My primary reference is to the software itself. Apple doesn't believe in providing security support. They expect you to buy bugfixes by continuously buying OSX versions, until the time they decide your hardware can't get run newer versions. Panther lost support when 10.5 shipped, and Tiger has lost support now that 10.6 has shipped. Tiger was being sold as new less than 24 months ago. That's pretty embarrassing.
To be on point, I'm really waiting for proper support of OpenCL for my 3870. Had I known Apple wouldn't care, I would have bought the Geforce 8800.Exactly. This is the message Apple is broadcasting: Don't buy our products expecting anything better than what you see today. The future isn't for you, unless you're standing at the cash register when it arrives.
After G
Sep 18, 2009, 12:24 AM
My primary reference is to the software itself. Apple doesn't believe in providing security support. They expect you to buy bugfixes by continuously buying OSX versions, until the time they decide your hardware can't get run newer versions. Panther lost support when 10.5 shipped, and Tiger has lost support now that 10.6 has shipped. Tiger was being sold as new less than 24 months ago. That's pretty embarrassing.I see nothing embarassing with a current version - 1 support scheme.
To give an example, Ubuntu LTS versions are only supported with security updates for 3 years from release. Non LTS versions only get 18 months of support.
Tiger came out April 2005, and was succeeded by Leopard in October 2007. It continued to receive security updates until August 2009, when Snow Leopard came out. If all updates ended there, that means Tiger was supported for 4 years and 4 months, far longer than even an LTS release of Ubuntu.
Actually, Apple just released a security update (http://support.apple.com/kb/HT3865) for Tiger this month. It's still supported :)
Note that support is not measured from when you bought the product, but when the product was released. In fact Microsoft has ended its own support for XP, and only extended (read:business) support remains to 2014. Does the fact that netbooks are still being sold with XP mean that they're out of support? No. The OEM supports it, and only for a year at that. I'm sure you could pay someone to support OS X if you wanted to stay on the same version badly enough when it becomes unsupported.
After G
Sep 18, 2009, 12:45 AM
Thank you.
To be on point, I'm really waiting for proper support of OpenCL for my 3870. Had I known Apple wouldn't care, I would have bought the Geforce 8800.Hopefully they will. Just because its out for NVIDIA first, doesn't mean ATI won't support it. The delay might be because ATI was working on its own competitor to CUDA before OpenCL.
OpenCL was announced some years ago, its only just hit a stable release! It was DEVELOPED using older cards...
Apple's just been MEGA lazy with the drivers!!!
We're talking 3.5 years of computers... they've chosen not 2 support 2 VERY POPULAR ATI cards (which basically cover everybody who doesn't have the latest round of MBP's or Mac Pro's.)
OpenCL - doesn't support computers more than 2 years old (although it was drafted out and announced before then)I can agree with the lazy driver part. You're lucky at least you have an ATI card which means the possibility of support. I have integrated Intel graphics on my 2006 mini, which probably means no support ... oh wait, OpenCL runs on CPUs too! Yay, I'm supported!
Exactly. This is the message Apple is broadcasting: Don't buy our products expecting anything better than what you see today. The future isn't for you, unless you're standing at the cash register when it arrives.This is not the message Apple has.
You don't expect this sort of thing from other companies; for example car companies don't update any of the electronics on your car barring a life-threatening situation in which they are forced to do a recall.
It's the same type of ridiculousness that copyright supporters try to push - the artists create once and we pay forever. With computers, we are expecting the same thing - we pay once, and expect the programmers to work for us forever. Not going to happen.
creon
Sep 18, 2009, 01:05 AM
About time!
Bring on the speed!!
djgamble
Sep 18, 2009, 02:01 AM
This is not the message Apple has.
You don't expect this sort of thing from other companies; for example car companies don't update any of the electronics on your car barring a life-threatening situation in which they are forced to do a recall.
It's the same type of ridiculousness that copyright supporters try to push - the artists create once and we pay forever. With computers, we are expecting the same thing - we pay once, and expect the programmers to work for us forever. Not going to happen.
I think it is their message, and it's VERY smart marketing.
Their message is... expect EXACTLY the same hardware that you purchased, we're not going to make new drivers that enhance old hardware.
Now... yes, one would be naive to expect Apple to turn their old hardware into something blazing fast for free. What I'm talking here is... my computer came with Leopard! Since then Apple's made a TINY update to the graphics drivers... not some of the CPU load can be pushed over to the GPU (they could technically do this for ANY GPU that's sitting there idle) but... they've chosen not to.
Now... I've actually paid for this update! 2/3 of the BIGGEST selling points aren't even supported (OpenCL and 64-bit kernel.)
Meh I'm just a little let down... I have the right to be let down, just as I do with other companies. I'm a consumer!
Amdahl
Sep 18, 2009, 03:03 AM
I see nothing embarassing with a current version - 1 support scheme.
To give an example, Ubuntu LTS versions are only supported with security updates for 3 years from release. Non LTS versions only get 18 months of support. So you're comparing Apple to a free product so you can have a favorable comparison? Why don't you compare to Microsoft instead?
Tiger came out April 2005, and was succeeded by Leopard in October 2007. It continued to receive security updates until August 2009, when Snow Leopard came out. If all updates ended there, that means Tiger was supported for 4 years and 4 months, far longer than even an LTS release of Ubuntu.It was sold less than 24 months ago as new, and now it is unsupported. Don't talk about someone who bought it 4 years ago, and don't compare it to a free product.
Actually, Apple just released a security update (http://support.apple.com/kb/HT3865) for Tiger this month. It's still supported :)That update was in the can before SL shipped, just as Tiger got 10.4.11 two weeks after Leopard shipped. Tiger did not get the Java update, even though JDK 1.4.2 was updated for 10.5. Support has been discontinued for Tiger, in line with Apple practices.
Note that support is not measured from when you bought the product, but when the product was released. In fact Microsoft has ended its own support for XP, and only extended (read:business) support remains to 2014. Does the fact that netbooks are still being sold with XP mean that they're out of support? No. The OEM supports it, and only for a year at that. I'm sure you could pay someone to support OS X if you wanted to stay on the same version badly enough when it becomes unsupported.Support is measured from when a customer buys a supported product from the vendor to the time they stop getting support. PERIOD. Microsoft is still providing security updates to XP for ALL customers until 2014. Windows 2000 is getting security updates for ALL customers until 2010. Microsoft does it right, better than anyone I can think of it.
It's the same type of ridiculousness that copyright supporters try to push - the artists create once and we pay forever. With computers, we are expecting the same thing - we pay once, and expect the programmers to work for us forever. Not going to happen.
How do you explain Microsoft? 2014 isn't forever. It is called standing behind your product. It's about selling computers, software, and systems that the customer can rely on. Apple is about selling toys.
After G
Sep 18, 2009, 03:44 AM
So you're comparing Apple to a free product so you can have a favorable comparison? Why don't you compare to Microsoft instead?Free product, not free support. You're also making the mistake that free is inferior. I'm comparing them because they're both similar (Unix-based) and updated a lot more than Windows. Actually I could argue that Ubuntu is better than either Windows or Mac OS because updates are free so theoretically you are supported in perpetuity just for the cost of time downloading the ISO and upgrading.
That update was in the can before SL shipped, just as Tiger got 10.4.11 two weeks after Leopard shipped. Tiger did not get the Java update, even though JDK 1.4.2 was updated for 10.5. Support has been discontinued for Tiger, in line with Apple practices.Okay, but my 4 years, 4 months still stands. Which you will see if you read the rest of my post is quite good.
Support is measured from when a customer buys a supported product from the vendor to the time they stop getting support. PERIOD. Microsoft is still providing security updates to XP for ALL customers until 2014. Windows 2000 is getting security updates for ALL customers until 2010. Microsoft does it right, better than anyone I can think of it.Don't lie about the all customers thing. When's the last time anyone could just call MS up and ask about their product. If you had a prebuilt box, they'd say, "Go to HP, Dell, etc, they are the ones who provide support." And OEM's don't do much longer than three years with extended warranty. Unless you bought retail, which most people don't, the offer of Microsoft support doesn't apply to you. If you bought an OEM disk because it was cheaper, the offer of Microsoft support doesn't apply to you - because you are the one supporting your own box. You're referring to business support, which of course the business paid for. You can get support for anything if you pay enough.
How do you explain Microsoft? 2014 isn't forever. It is called standing behind your product. It's about selling computers, software, and systems that the customer can rely on. Apple is about selling toys.
Alright, let's compare.
Mainstream support (http://support.microsoft.com/gp/lifepolicy) (read:Consumer, Hardware, and Multimedia products) is 5 years for product + 2 years for successive service packs. No extended support. So that 2014 you state wouldn't even apply to me. Given, it wouldn't apply since I bought all my PCs from a OEM, who is apparently supposed to do the support and not Microsoft.
Based on this (http://support.microsoft.com/lifecycle/?LN=en-us&C2=1173) XP mainstream support is over for every single version of Windows XP except SP3. Each service pack gives you 2 years and then support ends. Less than the 3 years of Ubuntu, and less than 4 years with Tiger. If I couldn't upgrade to the next service pack (which many can't for reasons of compatibility), I'm SOL.
It was sold less than 24 months ago as new, and now it is unsupported. Don't talk about someone who bought it 4 years ago, and don't compare it to a free product.Don't mix up who supports the hardware purchase. For consumers the support is limited to the OEM - 1 year for basic warranty. 3 years for extended. Anything else, you're either a business with a support contract or you support yourself. And I can compare what I like. Don't dismiss my comparison because of your own internal prejudices. And Ubuntu support is 3 years from date of release - so in your eyes that would make it even worse if someone downloaded the LTS ISO 2 years and 11 months down the line.
gnasher729
Sep 18, 2009, 04:22 AM
OpenCL was announced some years ago, its only just hit a stable release! It was DEVELOPED using older cards...
Apple's just been MEGA lazy with the drivers!!!
We're talking 3.5 years of computers... they've chosen not 2 support 2 VERY POPULAR ATI cards (which basically cover everybody who doesn't have the latest round of MBP's or Mac Pro's.)
Do these VERY POPULAR ATI cards support IEEE 754 floating point arithmetic? If not, nobody in the world would be able to write OpenCL drivers for them that are worth a penny.
thewinelake
Sep 18, 2009, 04:55 AM
SL really doesn't come without GCD. In fact libdispatch and the kernel work together very tightly. From what I can see the major reason to get SL out the door was to get GCD out in the wild, GCD is the whole point of the SL release.
I think you may have misunderstood what I was trying to say - what I meant was to compile the app with and without GCD-savvy thread management and then run it on SL.
Ditto with OpenCL
holmesf
Sep 18, 2009, 04:59 AM
OpenCL was announced some years ago, its only just hit a stable release! It was DEVELOPED using older cards...
Apple's just been MEGA lazy with the drivers!!!
We're talking 3.5 years of computers... they've chosen not 2 support 2 VERY POPULAR ATI cards (which basically cover everybody who doesn't have the latest round of MBP's or Mac Pro's.)
It's just a grudge match against ATI... Apple ALWAYS does this...
OS X OpenGL - didn't support the older ATI cards (OS 9 OpenGL did though? Apple got sued big time for that one because they'd sold a bunch of iMacs to a HUGE law firm... claiming they were OS X capable as they had G3's)
Quartz extreme - didn't support the Rage Pro...
Quartz extreme (when updated) - didn't support the GeForce 2
OpenCL - doesn't support computers more than 2 years old (although it was drafted out and announced before then)
---
It's Apple's tactic. Most Macs have built-in graphics cards and you CAN'T get these new features without buying a whole new computer.
Windows is different... you can generally upgrade your graphics card even if you have a cheap computer. And... Windows... tends to support pretty well every card (out of a MUCH larger pool that Apple's offering)... Apple's restricted OS X to the last 3 years of computers produced and STILL can't support every computer. It's a farce!
No it's not a farce. The hardware capabilities just haven't been there that long. This is bleeding edge stuff for a reason.
The OpenCL specification didn't even exist until last June. It was not announced "some years ago"
OpenCL is based on NVidia's CUDA architecture, and underneath the hood NVidia GPUs use CUDA for their OpenCL support. So the NVidia GPUs that support OpenCL are the same ones that support CUDA. Broader compatibility with NVidia GPUs is not possible. As for why older (desktop) ATI cards are not supported, read this straight from the horse's (AMD's) mouth: http://netkas.org/?p=182. And Intel integrated graphics ... don't get me started. From the looks of it Intel integrated graphics won't support any form of GPGPU until 2010.
So Apple has adequately supported the graphics hardware that could feasibly support OpenCL. Machines that don't have GPUs supporting OpenCL will still benefit because OpenCL can also run in parallel on the CPU. That includes every single Intel Mac!
Moreover, OpenCL was a necessary feature to add to Snow Leopard because Microsoft is working on its own competing GPGPU solution in DirectX.
Finally, in the long run, OpenCL will bring great benefits to the whole industry, because finally there exists an open GPGPU standard supported across multiple vendors. Developers no longer have to use one set of tools for ATI cards and another for NVidia cards.
It's absolutely crazy to think that OpenCL support in Snow Leopard constitutes some kind of money grab. It's beneficial to the industry, it benefits the performance of every single Mac that you can legally install Snow Leopard on, and it was necessary for Apple to remain competitive with Microsoft.
djgamble
Sep 18, 2009, 05:58 AM
No it's not a farce. The hardware capabilities just haven't been there that long. This is bleeding edge stuff for a reason.
The OpenCL specification didn't even exist until last June. It was not announced "some years ago"
OpenCL is based on NVidia's CUDA architecture, and underneath the hood NVidia GPUs use CUDA for their OpenCL support. So the NVidia GPUs that support OpenCL are the same ones that support CUDA. Broader compatibility with NVidia GPUs is not possible. As for why older (desktop) ATI cards are not supported, read this straight from the horse's (AMD's) mouth: http://netkas.org/?p=182. And Intel integrated graphics ... don't get me started. From the looks of it Intel integrated graphics won't support any form of GPGPU until 2010.
So Apple has adequately supported the graphics hardware that could feasibly support OpenCL. Machines that don't have GPUs supporting OpenCL will still benefit because OpenCL can also run in parallel on the CPU. That includes every single Intel Mac!
Moreover, OpenCL was a necessary feature to add to Snow Leopard because Microsoft is working on its own competing GPGPU solution in DirectX.
Finally, in the long run, OpenCL will bring great benefits to the whole industry, because finally there exists an open GPGPU standard supported across multiple vendors. Developers no longer have to use one set of tools for ATI cards and another for NVidia cards.
It's absolutely crazy to think that OpenCL support in Snow Leopard constitutes some kind of money grab. It's beneficial to the industry, it benefits the performance of every single Mac that you can legally install Snow Leopard on, and it was necessary for Apple to remain competitive with Microsoft.
If you're going to post a link then make sure it backs up your claims:
Looks like they have ability to run opencl on old radeon cards but it isn’t enabled yet.
It can be just two reasons why – it’s not finished/ready yet and will be finished later, or they are not going to enable it and pushing users for upgrade.
AMEN!!! They're capable! The debate is over why Apple hasn't enabled it yet?!?
Why? Because they don't think it's worth it... Apple's made its position quite clear on upgrade paths. With regard to PPC users they made a great excuse... something along the lines of "these guys aren't at the bleeding edge and don't generally upgrade stuff anyway"... sure, they're living in the old days.
BUT this should mean that ALL Intel users get respect because ALL the PPC users are being cut off. Having OpenCL on an older Mac would bring new life into it! On a Mac Pro... you'll hardly notice it 'coz it's lightning fast already...
ikir
Sep 18, 2009, 08:02 AM
Now if only Snow Leopard could actually function properly...
It works like a dream on my 2 Macs and all my friends MacBooks... so what is your problem? Have you tried a fresh installation?:o
stylewriter
Sep 18, 2009, 08:07 AM
They are capable... I can tell you that much.
What do you think rendering video involves? MATHS!
Your card was designed for rendering video... but Apple decided to make a driver that allowed it to work like a backup CPU. They were just laaaazy and have a really bad relationship with ATI nowadays.
AMEN!!! They're capable! The debate is over why Apple hasn't enabled it yet?!?
You might want to look at ATI's/AMD's and NVidia's supported GPUs for their own GPGPU libraries.
http://en.wikipedia.org/wiki/AMD_FireStream#AMD_stream_processing_lineup
http://en.wikipedia.org/wiki/CUDA#Supported_GPUs
djgamble
Sep 18, 2009, 08:49 AM
You might want to look at ATI's/AMD's and NVidia's supported GPUs for their own GPGPU libraries.
http://en.wikipedia.org/wiki/AMD_FireStream#AMD_stream_processing_lineup
http://en.wikipedia.org/wiki/CUDA#Supported_GPUs
You are so daft!
Supported means they've written a driver!!!!!!!
Unsupported means no driver exists at this stage...
---
It's like this random Japan-only Canon printer I have at work... there's no Mac drivers! But... if I do a bit of hacking to make it think it's another printer then it works. Just because it's "unsupported" doesn't mean Macs aren't capable of printing stuff out using it.
---
... Fable II is an Xbox game... MacOS and Windows are currently UNSUPPORTED... but I'm pretty sure that many Macs/PC's are capable of running it if they make a port.
---
My video camera is UNSUPPORTED on MacOS... but if I plug it into my Mac I can copy the video files over as if it's a hard disk... if somebody made a driver... well then iMovie could just import it!
---
I have a jailbroken iPod Touch. A bunch of GPS apps on the app store are UNSUPPORTED. I tricked iTunes into thinking my iPod Touch is an iPhone, so the apps install, and they work fine with my 3rd party GPS unit! Actually better than the built-in GPS given with the iPhone 3GS!!!!
---
UNSUPORTED means nobody's made a driver. Where there's a will there's a way... it's just Apple lacks will because these particular GPU's have been discontinued. That said... there's only 2 or 3 possible GPU's you can have on an Intel Mac that aren't supported... so it's damn lazy of them!
diamond.g
Sep 18, 2009, 08:50 AM
You might want to look at ATI's/AMD's and NVidia's supported GPUs for their own GPGPU libraries.
http://en.wikipedia.org/wiki/AMD_FireStream#AMD_stream_processing_lineup
http://en.wikipedia.org/wiki/CUDA#Supported_GPUs
Interesting the X1900XTX is listed, but none of the 2xxxHD is. I wonder why that is.
AidenShaw
Sep 18, 2009, 10:27 AM
In fact Microsoft has ended its own support for XP, and only extended (read:business) support remains to 2014. Does the fact that netbooks are still being sold with XP mean that they're out of support? No. The OEM supports it, and only for a year at that.
Actually, XP SP3 is supported until April 2010 at least. Support has been dropped for older service packs.
Also extended support gives all users security patches, not just businesses. Companies that pay for support will get other fixes.
Since you claim Tiger is still supported because it got a security patch, then by that standard XP is supported to 2014!
2002cbr600f4i
Sep 18, 2009, 10:45 AM
Interesting the X1900XTX is listed, but none of the 2xxxHD is. I wonder why that is.
Look again...
The tables are listing ATI STREAM Processing capable cards, NOT OpenCL ones... Yes, Stream is/was their proprietary answer to CUDA, but Stream != OpenCL... I'm sure there are things that are quite different between the two and those differences are probably why the 1900 isn't supported in OpenCL.
You won't find the 2xxx series supported by OpenCL (probably Ever) because they can't do double precision floating point, which is required for OpenCL use, among some other limitations related to memory interface and such.
Face it gang, right now even the 4xxx series support for OpenCL under OSX is pretty crappy. 1/2 the OpenCL example programs from the Apple Developer web pages I've tried don't run at all on my 2009 Mac Pro w/4870, or if they do, they run slower on the card than they do on my 2.93 Ghz Quad core CPU.
Remember that OpenCL is based on CUDA, which is NVidia's baby. I'm sure there's things about OpenCL that just favor NVidia's architecture a lot more than ATI's because of the different ways they do things. It's probably also a lot more work to build ATI drivers right now for the same reasons.
diamond.g
Sep 18, 2009, 11:38 AM
Look again...
The tables are listing ATI STREAM Processing capable cards, NOT OpenCL ones... Yes, Stream is/was their proprietary answer to CUDA, but Stream != OpenCL... I'm sure there are things that are quite different between the two and those differences are probably why the 1900 isn't supported in OpenCL.
You won't find the 2xxx series supported by OpenCL (probably Ever) because they can't do double precision floating point, which is required for OpenCL use, among some other limitations related to memory interface and such.
Face it gang, right now even the 4xxx series support for OpenCL under OSX is pretty crappy. 1/2 the OpenCL example programs from the Apple Developer web pages I've tried don't run at all on my 2009 Mac Pro w/4870, or if they do, they run slower on the card than they do on my 2.93 Ghz Quad core CPU.
Remember that OpenCL is based on CUDA, which is NVidia's baby. I'm sure there's things about OpenCL that just favor NVidia's architecture a lot more than ATI's because of the different ways they do things. It's probably also a lot more work to build ATI drivers right now for the same reasons.
It appears that AMD doesn't have the GPU OpenCL SDK out yet. Only the CPU one. So optimization hasn't happened yet.
jkdsteve
Sep 18, 2009, 12:07 PM
Anyone else having problems getting OpenCL to work on a 2006MacPro and ATI 4870? :(
stylewriter
Sep 18, 2009, 12:12 PM
The tables are listing ATI STREAM Processing capable cards, NOT OpenCL ones... Yes, Stream is/was their proprietary answer to CUDA, but Stream != OpenCL... I'm sure there are things that are quite different between the two and those differences are probably why the 1900 isn't supported in OpenCL.
Stream is an implementation of OpenCL. ATI's proprietary 'Close to Metal' crap fell flat, so now they have Stream, which implements OpenCL and some DirectX type GPGPU stuff. Your point still stands though, the list is too inclusive(it includes their old pre-OpenCL technology).
You won't find the 2xxx series supported by OpenCL (probably Ever) because they can't do double precision floating point, which is required for OpenCL use, among some other limitations related to memory interface and such.
The R700 (a.k.a. Radeon 4XXX) is pretty much the R600 (a.k.a. Radeon 2XXX and 3XXX) with double-precision support. For graphics, double precision is a waste, but for science or real work it can be very handy. The R700 is still primarily single-precision, with some double precision stuff bolted on. http://developer.amd.com/gpu_assets/R700-Family_Instruction_Set_Architecture.pdf
BUT... it does seem that Double-precision is an optional extension to OpenCL.
http://www.khronos.org/registry/cl/specs/opencl-1.0.43.pdf
The Nvidia 9400M doesn't support double-precision math.
The R600 has support for single-precision IEEE754 floats on 4 of it's 5 shader units.
http://www.beyond3d.com/content/reviews/16/8
stylewriter
Sep 18, 2009, 12:28 PM
They are capable... I can tell you that much.
What do you think rendering video involves? MATHS!
Your card was designed for rendering video... but Apple decided to make a driver that allowed it to work like a backup CPU. They were just laaaazy and have a really bad relationship with ATI nowadays.
Which one is capable? Give a specific unsupported card that is capable.
2002cbr600f4i
Sep 18, 2009, 01:02 PM
Anyone else having problems getting OpenCL to work on a 2006MacPro and ATI 4870? :(
It's the 4870 in general... Not just on 2006 Mac Pro's... Seems like the 4870's OpenCL driver just generally blows right now... I'm hoping it improves over point releases.
diamond.g
Sep 18, 2009, 01:03 PM
It's the 4870 in general... Not just on 2006 Mac Pro's... Seems like the 4870's OpenCL driver just generally blows right now... I'm hoping it improves over point releases.
Is there any reason we have to wait for point releases to get GPU driver updates if ATI and nVidia are the ones writing the drivers?
2002cbr600f4i
Sep 18, 2009, 01:38 PM
Is there any reason we have to wait for point releases to get GPU driver updates if ATI and nVidia are the ones writing the drivers?
No, but then I haven't exactly seen a lot of driver updates for Mac show up outside of the standard OSX System Updates, so I don't know that it's actually NVidia or ATI writing the drivers or if it's Apple doing it. Sure, there's no reason either would have to wait for 10.6.x point releases to drop new drivers, I just figured that would be a logical time to do so to make sure that "If you want to run this, you need to be on 10.6.x or higher to ensure support."
2002cbr600f4i
Sep 18, 2009, 02:28 PM
Given that it's been pouring here for the last 4 days, and they're calling for more rain all weekend (anyone got an Arc they can loan me???) I might go ahead and take a crack at writing some sort of program to test the various configurations discussed earlier in the thread.
Some possible ideas I had:
1) a process that takes 2 very huge (100000x100000 each) arrays of numbers and multiplies them together
2) something that takes a huge list of random text names in mixed case and converts them into uppercase.
3) something that does some kind of sort of a large amount of data (maybe the outputs of 1 and 2?) This is assuming I can find and implement a sort algorithm that can be done both linearly and threaded.
I might even try to make all 3 be run to REALLY stress things.
Note, that I plan to pre-generate all the test data so the exact same "random" data can be run over and over, taking variations in the data out of the "benchmarking" equation, regardless of user running it and computer.
So, my plan would be:
* Implement each of these things in a linear processing fashion.
* Implement each of these things to be done using blocks + GCD
* Implement each of these to be done by OpenCL
(Once I get the 1st or 2nd one implemented, I'll toss up the source code someplace and ask somebody who's more familiar with OSX development to write the normal PThreads type implementation since I don't have a clue how to do that.)
I'll keep you all updated to my progress over the weekend. I haven't done much Objective C - (in fact it's been nearly a year since I've touched it) but I think I'll be able to figure out how to do at least some of this this weekend.
Amdahl
Sep 18, 2009, 03:47 PM
You're doing a great job of trying to blur the issue, with regard to Linux, Microsoft, OEMs, Dell, HP, and stuff that has nothing to do with Apple.
So that 2014 you state wouldn't even apply to me. Given, it wouldn't apply since I bought all my PCs from a OEM, who is apparently supposed to do the support and not Microsoft.
Which is not true. EVERYBODY is getting security updates until 2014. You are reading too much in to the meaning of 'mainstream support.' That doesn't mean they cut off security updates.
Apple is embarrassing in their product support.
stylewriter
Sep 18, 2009, 04:18 PM
EVERYBODY is getting security updates until 2014. You are reading too much in to the meaning of 'mainstream support.' That doesn't mean they cut off security updates.
Apple is embarrassing in their product support.
Apple doesn't support legacy versions for very long, but XP has been getting non-patch statuses for less-than-critical known bugs.
For example: http://www.computerworld.com/s/article/9138007/Microsoft_No_TCP_IP_patches_for_you_XP
The bug lets you use all available memory for as long as you can keep sending packets to the computer. In effect, it lets you remotely crash an XP machine.
holmesf
Sep 18, 2009, 05:32 PM
If you're going to post a link then make sure it backs up your claims:
AMEN!!! They're capable! The debate is over why Apple hasn't enabled it yet?!?
Why? Because they don't think it's worth it... Apple's made its position quite clear on upgrade paths. With regard to PPC users they made a great excuse... something along the lines of "these guys aren't at the bleeding edge and don't generally upgrade stuff anyway"... sure, they're living in the old days.
BUT this should mean that ALL Intel users get respect because ALL the PPC users are being cut off. Having OpenCL on an older Mac would bring new life into it! On a Mac Pro... you'll hardly notice it 'coz it's lightning fast already...
You read the bottom but missed the points outlined by AMD. AMD is pointing out that the 3000 and 4000 HD series GPUs handle GPGPU very differently. The 3000 HD series has no compute support, so it targets the graphics pipeline by compiling to vertex and pixel shaders. Meanwhile the 4000 HD series does have compute support. You can imagine that this makes it technically difficult to support both the 3000 and 4000 series GPUs in OpenCL, if it's even technically possible. Just because there are some strings in a driver doesn't mean it's as simple as flipipng a switch.
The computer industry moves very fast. And the GPU industry moves even faster. This is just the nature of the business -- it's not some kind of conspiracy.
Amdahl
Sep 18, 2009, 06:40 PM
Apple doesn't support legacy versions for very long, but XP has been getting non-patch statuses for less-than-critical known bugs.
For example: http://www.computerworld.com/s/article/9138007/Microsoft_No_TCP_IP_patches_for_you_XP
The bug lets you use all available memory for as long as you can keep sending packets to the computer. In effect, it lets you remotely crash an XP machine.
Yes, if you have the Firewall off. That's their excuse. Many people believe they need to fix this bug on XP and Windows 2000, no matter what their excuse. And I agree.
But thank you for pointing out that Microsoft is so thorough in their security updates that when they try to weasel out of one, the whole world takes note.
djgamble
Sep 18, 2009, 07:50 PM
You read the bottom but missed the points outlined by AMD. AMD is pointing out that the 3000 and 4000 HD series GPUs handle GPGPU very differently. The 3000 HD series has no compute support, so it targets the graphics pipeline by compiling to vertex and pixel shaders. Meanwhile the 4000 HD series does have compute support. You can imagine that this makes it technically difficult to support both the 3000 and 4000 series GPUs in OpenCL, if it's even technically possible. Just because there are some strings in a driver doesn't mean it's as simple as flipipng a switch.
The computer industry moves very fast. And the GPU industry moves even faster. This is just the nature of the business -- it's not some kind of conspiracy.
No, I read the part you refer to saying there's no drivers!!!
I then read the bottom where they said despite the lack of drivers, there's no technical reason why MOST older cards can't support OpenCL.
Your article backs up my points 100%... thanks for the reference.
stylewriter
Sep 19, 2009, 12:09 AM
Yes, if you have the Firewall off. That's their excuse. Many people believe they need to fix this bug on XP and Windows 2000, no matter what their excuse. And I agree.
But thank you for pointing out that Microsoft is so thorough in their security updates that when they try to weasel out of one, the whole world takes note.
Actually, the weaseling is that XP doesn't have any listening services enabled by default. Any service with a listening service automatically creates a hole in the built-in firewall. If you have a firewall protecting your local network, then you can limit the attack to only computers on your local network. If XP is directly connected to the internet, then anyone can remotely crash the computer if you have file/printer-sharing or remote desktop enabled. The only reason this is less-than-critical is that you can only crash XP computers, not remotely control them.
holmesf
Sep 19, 2009, 04:39 AM
No, I read the part you refer to saying there's no drivers!!!
I then read the bottom where they said despite the lack of drivers, there's no technical reason why MOST older cards can't support OpenCL.
Your article backs up my points 100%... thanks for the reference.
:rolleyes:
Most older cards? I guess there's no technical reason my abacus can't run OpenCL either.
Sharangad
Sep 19, 2009, 12:39 PM
Apple's just been MEGA lazy with the drivers!!!
We're talking 3.5 years of computers... they've chosen not 2 support 2 VERY POPULAR ATI cards (which basically cover everybody who doesn't have the latest round of MBP's or Mac Pro's.)
It's just a grudge match against ATI... Apple ALWAYS does this...
OS X OpenGL - didn't support the older ATI cards (OS 9 OpenGL did though? Apple got sued big time for that one because they'd sold a bunch of iMacs to a HUGE law firm... claiming they were OS X capable as they had G3's)
Quartz extreme - didn't support the Rage Pro...
Quartz extreme (when updated) - didn't support the GeForce 2
OpenCL - doesn't support computers more than 2 years old (although it was drafted out and announced before then)
Unfortunately your 3870 doesn't support shared memory which is required for OpenCL. All nvidia DirectX10 cards supports this but of Ati cards only the HD4000 series support this. Your 3870 will never support OpenCL.
As for your older cards, The Rage Pro was a DX5 generation card, which was probably why it was dropped.
Quartz Extreme required DirectX 9 class (programmable hardware), the geforce 2 was a DX7 card and didn't have programmable shaders (cores) to run QuartzExtreme.
Now the 3870 doesn't have shared memory despite being a DX10 card ( as DX10 didn't require it. However nvidia's all of nvidia's DX10 cards support it as they were designed for CUDA and OpenCL is just a non-proprietary version of CUDA).
If anything you should blame Ati for building such a lame card.
holmesf
Sep 20, 2009, 01:55 AM
Given that it's been pouring here for the last 4 days, and they're calling for more rain all weekend (anyone got an Arc they can loan me???) I might go ahead and take a crack at writing some sort of program to test the various configurations discussed earlier in the thread.
Some possible ideas I had:
1) a process that takes 2 very huge (100000x100000 each) arrays of numbers and multiplies them together
2) something that takes a huge list of random text names in mixed case and converts them into uppercase.
3) something that does some kind of sort of a large amount of data (maybe the outputs of 1 and 2?) This is assuming I can find and implement a sort algorithm that can be done both linearly and threaded.
I might even try to make all 3 be run to REALLY stress things.
Note, that I plan to pre-generate all the test data so the exact same "random" data can be run over and over, taking variations in the data out of the "benchmarking" equation, regardless of user running it and computer.
So, my plan would be:
* Implement each of these things in a linear processing fashion.
* Implement each of these things to be done using blocks + GCD
* Implement each of these to be done by OpenCL
(Once I get the 1st or 2nd one implemented, I'll toss up the source code someplace and ask somebody who's more familiar with OSX development to write the normal PThreads type implementation since I don't have a clue how to do that.)
I'll keep you all updated to my progress over the weekend. I haven't done much Objective C - (in fact it's been nearly a year since I've touched it) but I think I'll be able to figure out how to do at least some of this this weekend.
I'd be very interested in the results! A few notes though to do with OpenCL on your proposed tasks:
1. OpenCL (when running on the GPU) is bound by the bandwidth between GPU and CPU. This tends to be around 5GB per second -- but the GPU can do many times more float point operations per second than this (about 200 times more!). So when you choose a benchmark, you should choose one where the number of operations per piece of data that you transfer is very high. Preferably don't choose a O(n) type operation.
2. OpenCL (and GPUs!) aren't designed for text. Also making something uppercase is a O(n) operation (where n is length of the string) so you're bandwidth bound (see point 1). This makes it not the best OpenCL benchmark.
3. It's very difficult (though possible) to write an efficient sorting algorithm on OpenCL. In fact it's so hard to do well that this is the sort of stuff people publish papers on in the GPGPU industry!
2002cbr600f4i
Sep 20, 2009, 01:34 PM
I'd be very interested in the results! A few notes though to do with OpenCL on your proposed tasks:
1. OpenCL (when running on the GPU) is bound by the bandwidth between GPU and CPU. This tends to be around 5GB per second -- but the GPU can do many times more float point operations per second than this (about 200 times more!). So when you choose a benchmark, you should choose one where the number of operations per piece of data that you transfer is very high. Preferably don't choose a O(n) type operation.
2. OpenCL (and GPUs!) aren't designed for text. Also making something uppercase is a O(n) operation (where n is length of the string) so you're bandwidth bound (see point 1). This makes it not the best OpenCL benchmark.
3. It's very difficult (though possible) to write an efficient sorting algorithm on OpenCL. In fact it's so hard to do well that this is the sort of stuff people publish papers on in the GPGPU industry!
Well, I have my little (java) data generator done and working. That's as far as I've gotten...
It's been nearly a year since I've touched Objective-C and it's amazing how much I've forgotten, so I don't know that I'll get anywhere significant on this today.
That being said, my plan was to not use ANY of the library calls for things like converting a string to uppercase, or doing the matrix multiplication. Everything was going to be done using loops, or breaking the data set up and assigning subsets of the data to different threads/cores.
Actually, uppercasing a word IS mathematical in that you have to look at the ASCII code of each character in the string (so yeah, that's O(n) per string, O(n^2) across the set), see if it's in the lowercase set, and if not, add in the offset value to get the uppercase one. The fact that it's an O(n) operation, but that no 2 items in the array are data related to each other SHOULD allow a threaded process the ability to do this quicker than O(n^2) as subsets of the full set can be processed by each thread.
The matrix multiplication I'll need to check the math on, as I don't remember the rules for performing that. My hope was that I could find some operation where I could ship off subsets of the data (like a whole row) to each thread/core) but I'm not sure multiplication on arrays is independant like that.
As far as the sorting - strange, there's several good multithreaded sorting algorithms (variations on Quicksort for instance). I would think one of those would work in OpenCL code, but maybe I'm wrong.
The key to this test is to do things as similarly as possible between the various methods, so you're not using anything super optimized for any of them. We're just trying to demonstrate that linear code gets a speed up from normal Leopard threads, then how that same code behaves using GCD, and then how it behaves again using OpenCL.
Anyhow, my grad school classes start back up on Tuesday, so I don't know that I'm going to even get the single threaded version done before then. After Tuesday I'm slammed. If there's somebody reading this who is a much better Obj-C OSX programmer than me who wants to take a crack at this, let me know and I'll explain what I was looking into doing and how...
gattis
Sep 20, 2009, 04:51 PM
I tried to optimize some of the stuff I am developing using Snow Leopard and OpenCL. The limited bandwidth over the PCI Express bus actually makes simple matrix multiplications many times slower than a multicore CPU on most cards. Outside of graphics applications, I don't really see the potential for GPGPU until we see a unified memory architecture where GPUs can access the RAM as fast as the CPU. And graphics applications can already use the GPU using older libraries such as OpenGL which do common operations for you. I don't see where OpenCL adds any benefit just yet, but its good that Apple is getting there early on the software front. We just have to wait for the hardware to catch up now.
Anyway, here's my writeup on my experimentation with Snow Leopard's OpenCL:
http://mattgattis.com/blog/2009/09/19/opencl-impressions/
holmesf
Sep 20, 2009, 05:27 PM
Well, I have my little (java) data generator done and working. That's as far as I've gotten...
It's been nearly a year since I've touched Objective-C and it's amazing how much I've forgotten, so I don't know that I'll get anywhere significant on this today.
That being said, my plan was to not use ANY of the library calls for things like converting a string to uppercase, or doing the matrix multiplication. Everything was going to be done using loops, or breaking the data set up and assigning subsets of the data to different threads/cores.
Actually, uppercasing a word IS mathematical in that you have to look at the ASCII code of each character in the string (so yeah, that's O(n) per string, O(n^2) across the set), see if it's in the lowercase set, and if not, add in the offset value to get the uppercase one. The fact that it's an O(n) operation, but that no 2 items in the array are data related to each other SHOULD allow a threaded process the ability to do this quicker than O(n^2) as subsets of the full set can be processed by each thread.
I agree that a threaded process can do it faster than an unthreaded process, but on the GPU you've got the problem that since you're doing such little work per character (just a possible add) it's not worth transferring the characters to the GPU. I also don't think the GPU would handle the conditional check that you have to do on each character very well (where you check if it's lower case before adding) ... the reason is that for good performance on the GPU you have to arrange the data so that related threads (called a warp) all take the same path through each conditional expression ... maybe there's a way around this, but otherwise each warp (usually set of 32 threads) ends up running in serial instead of parallel ... ouch a 32x performance decrease!
The matrix multiplication I'll need to check the math on, as I don't remember the rules for performing that. My hope was that I could find some operation where I could ship off subsets of the data (like a whole row) to each thread/core) but I'm not sure multiplication on arrays is independant like that.
Matrix multiplication is actually a very good candidate for GPGPU processing. The reason is that there is a high ratio of work done per floating point number transferred. Dividing up that work is still tricky though.
As far as the sorting - strange, there's several good multithreaded sorting algorithms (variations on Quicksort for instance). I would think one of those would work in OpenCL code, but maybe I'm wrong.
The key to this test is to do things as similarly as possible between the various methods, so you're not using anything super optimized for any of them. We're just trying to demonstrate that linear code gets a speed up from normal Leopard threads, then how that same code behaves using GCD, and then how it behaves again using OpenCL.
Ok.
Unfortunately with OpenCL sometimes it's super-optimized or there's just no point. GPUs are strange beasts -- every step of the algorithm you use and its details needs to take their architecture into account. If you screw up your memory access pattern on a GeForce 8800, for example, your memory bandwidth can suddenly be cut in 16. Sorting is a problem where it's especially difficult to get this right.
Anyway, I said that people write papers on this sort of stuff, so here's one if you have time:
http://mgarland.org/files/papers/gpusort-ipdps09.pdf
Anyhow, my grad school classes start back up on Tuesday, so I don't know that I'm going to even get the single threaded version done before then. After Tuesday I'm slammed. If there's somebody reading this who is a much better Obj-C OSX programmer than me who wants to take a crack at this, let me know and I'll explain what I was looking into doing and how...
Cool, I'm starting my master's program in CS on Thursday!
bmb012
Sep 20, 2009, 08:22 PM
Strange how all of this is still so up in the air, you'd think Apple would have some sort of proof of concept 'killer-app,' or at least some samples ready to show the benefits of all this.
What about physics calculations? Aren't those PPUs basically just GPUs that you can run physics code on? Any idea if PhysX will support openCL?
holmesf
Sep 20, 2009, 08:41 PM
Strange how all of this is still so up in the air, you'd think Apple would have some sort of proof of concept 'killer-app,' or at least some samples ready to show the benefits of all this.
What about physics calculations? Aren't those PPUs basically just GPUs that you can run physics code on? Any idea if PhysX will support openCL?
Since PhysX can run using CUDA on those GeForce cards that support CUDA, and since OpenCL is based on CUDA, my guess would be yes, PhysX could be written to use OpenCL.
As for killer apps ... Apple hasn't bundled anything with the OS but if you look in the Snow Leopard XCode examples you'll find some samples that are pretty good demonstrators. The n-body problem example even has a little speed gauge that lets you compare the performance of the app running on GPU, CPU, and GPU+CPU.
SPUY767
Sep 21, 2009, 10:56 AM
Yeah developers, start taking advantage of this ****.
The sad part is, it borders on trivial to add GCD and OCL functionality to applications if they've been programmed with XCode and follow Apple's API guidelines religiously. Granted, some houses like Adobe are just gluttons for punishment and would rather use an inferior IDE and API.
holmesf
Sep 21, 2009, 07:59 PM
The sad part is, it borders on trivial to add GCD and OCL functionality to applications if they've been programmed with XCode and follow Apple's API guidelines religiously. Granted, some houses like Adobe are just gluttons for punishment and would rather use an inferior IDE and API.
Adding OpenCL support is far from trivial.
GCD is easier, but if you've got a gigantic codebase like Adobe making the slightest change in the internals is probably a gargantuan undertaking!
2002cbr600f4i
Sep 21, 2009, 10:28 PM
Adding OpenCL support is far from trivial.
GCD is easier, but if you've got a gigantic codebase like Adobe making the slightest change in the internals is probably a gargantuan undertaking!
Not to mention Adobe has to do the whole Carbon-->Cocoa conversion as well..
pmjoe
Sep 23, 2009, 08:11 AM
So, is the conclusion to this thread that Apple should develop a "physics application" to demonstrate the benefits of GCD and Open CL to the masses?
I'm sure that'll be a hot download. ;)
I wonder what that says for those of us who said pages ago this stuff isn't that useful for most mainstream applications?
holmesf
Sep 23, 2009, 04:37 PM
So, is the conclusion to this thread that Apple should develop a "physics application" to demonstrate the benefits of GCD and Open CL to the masses?
I'm sure that'll be a hot download. ;)
I wonder what that says for those of us who said pages ago this stuff isn't that useful for most mainstream applications?
You mean not everybody spends their day ripping blue ray movies, doing medical tomography, and running fluid simulations? I feel so lonely.
NT1440
Sep 23, 2009, 04:39 PM
And Flash still slows the computer down and makes the fans spin like crazy.
Why does Flash still suck?
Seriously, even a cheap Winblows netbook can run Flash better than a top-of-the-line maxed-out Mac Pro.
BLAME ADOBE
holmesf
Sep 23, 2009, 11:02 PM
BLAME ADOBE
Indeed ... flash on Mac has always sucked.
I remember back in the year 2000 the shame of having my PowerMac G3 pale in comparison to an eMachines piece of junk at running flash. And to add insult to injury, you had to use that damn hockey puck mouse :mad:
medip
Sep 28, 2009, 09:04 AM
i am raging so hard. i baught the new mac book pro 15 inch for $3000, and i installed parallel tools on it, and it was working fine for the first few weeks, but now it freezes in windows and in mac os! i cant believe i spent that sort of money on this piece of trash, i need help. i do play games that use alot of ram and stuff but not enough to completely crap my comp out. any tipswould be helpful. i already tried changing the settings for my virtual machine but it made windows slower and crapper, and when i changed them back it was still the same, and i also reinstalled the mac drivers for windows. please someone help me.:mad:
diamond.g
Sep 28, 2009, 09:07 AM
i am raging so hard. i baught the new mac book pro 15 inch for $3000, and i installed parallel tools on it, and it was working fine for the first few weeks, but now it freezes in windows and in mac os! i cant believe i spent that sort of money on this piece of trash, i need help. i do play games that use alot of ram and stuff but not enough to completely crap my comp out. any tipswould be helpful. i already tried changing the settings for my virtual machine but it made windows slower and crapper, and when i changed them back it was still the same, and i also reinstalled the mac drivers for windows. please someone help me.:mad:
Simple fix, uninstall Windows ;)
CQd44
Sep 28, 2009, 12:40 PM
i am raging so hard. i baught the new mac book pro 15 inch for $3000, and i installed parallel tools on it, and it was working fine for the first few weeks, but now it freezes in windows and in mac os! i cant believe i spent that sort of money on this piece of trash, i need help. i do play games that use alot of ram and stuff but not enough to completely crap my comp out. any tipswould be helpful. i already tried changing the settings for my virtual machine but it made windows slower and crapper, and when i changed them back it was still the same, and i also reinstalled the mac drivers for windows. please someone help me.:mad:
I suppose you should try boot camp.
prostuff1
Sep 28, 2009, 01:36 PM
i am raging so hard. i baught the new mac book pro 15 inch for $3000, and i installed parallel tools on it, and it was working fine for the first few weeks, but now it freezes in windows and in mac os! i cant believe i spent that sort of money on this piece of trash, i need help. i do play games that use alot of ram and stuff but not enough to completely crap my comp out. any tipswould be helpful. i already tried changing the settings for my virtual machine but it made windows slower and crapper, and when i changed them back it was still the same, and i also reinstalled the mac drivers for windows. please someone help me.:mad:
You seriously expected to be able to run games at a decent pace from within parallels.
If you want to do any gaming I suggest you set up your computer so that you can boot windows. Install BootCamp and set it up, that way you can play your games to your hearts content and get the performance needed.
vBulletin® v3.8.6, Copyright ©2000-2012, Jelsoft Enterprises Ltd.