Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Wrong. All applications written for SL will take advantage of GCD when the 3rd party dev steps up and implements it. To see OpenOffice and MS Office re-written to leverage GCD and benefit drastically from blocks by leveraging unused CPU cores would bring random comments of ``everything is so fast and snaps when I make a change to my document, my 10,000 x 100 spreadsheet, to my ability to connect to database sources and scale up, etc.''

iLife will take full advantage of both.

OpenCL is being taken advantage of at the low-level presently.

Any application [including all of iWorks] can leverage OpenCL for it's offloading of number crunching, aiding Quartz in various aspects to streamlining processes for WebKit and thus give everyone an improved experience. Built-in SVG, WebGL for Opera, Firefox, Safari, Chrome, etc., will benefit on OS X.

The impact for an application like Keynote will be more visible the more Keynote expands into OpenGL presentations that include interactive fly-throughs and much more.

Any application that takes external data sources and requires numerical analysis of numbers, pattern matching, etc., will take advantage of it.

It's all dependent upon the developers time, vision and goals of their applications.

All games take advantage of it because of the instant dependency of the environment to sap the life out of a CPU(s).

Any Graphics Editor, Flash editor, SVG editor, multimedia suite, Audio/Video application can immediately leverage both but will require re-architecting portions of the code to make it happen.

It's not unreasonable to expect 6-9 months before major vendors bring out new versions leveraging both with considerable peformance improvements while reducing overhead to reach those aims.

You are so far off with this post I'm not even going to try to correct it, although I did probably underestimated the usefulness of GCD in standard apps. Your understanding of OpenCL is waaaay off.
 
But of course developers will need to rework their apps, and I doubt that a lot of them are going to do that until they release another version.

For those that really see a competitive advantage though, I would expect the upgrade to be done pretty rapidly.

Even a 2 Core system isn't being leveraged efficiently until the advent of SL to the market for Macs.

If that's true then OS X must have been really poorly coded. (And I don't think it was.)
 
Right now, developers create threads when they're essential, not when they're potentially useful. This changes all that, and yes, the improvements will filter into all applications, even if it just means better GUIs (say goodbye to the beachball). Developers generally don't care enough to see the performance implications of threading, and a lot of time is wasted waiting for concurrent processes to catch up. With smaller and more mobile threads, this will change.
 
Right now, developers create threads when they're essential, not when they're potentially useful.
GCD will not help developers know any better about when threads might be "potentially useful".
and yes, the improvements will filter into all applications, even if it just means better GUIs
GUI libraries on pretty much every modern platform already have built-in multi-threading.
 
OpenCL and h.264 *ENCODING*

I've never seen an answer to this type of question, not even a ballpark figure....

Has anyone read what kind of speed up one might expect for h.264 video ENcoding using OpenCL on a MacPro that was maxed out with the standard-class video cards Apple offers?

just broad speedup. 10x? 100x?

Or is h.264 encoding not parallelizable enough to actually see much of a boost?
 
GCD will not help developers know any better about when threads might be "potentially useful".
If it makes it easier for them to code a few more sections of their apps multithreaded, programmers and companies will be a lot more likely to actually do so. There is always a cost benefit analysis. This hopefully reduces the cost.
 
I'm not certain the next iLife apps will feature OpenCL; Apple likes to clearly differentiate between consumer and pro-level products, so OpenCL functionality could be a key differentiator.

I've not coded with OpenCL yet, but I can't imagine using it in any of the Pro apps would be a quick job.
 
GCD will not help developers know any better about when threads might be "potentially useful".

GUI libraries on pretty much every modern platform already have built-in multi-threading.

Yes, and no....

Yes, they INTERNALLY do things like drawing in their own internally created threads (ie, when I tell the system to "draw a window", that call may internally use threads to do the work of drawing), but it's usually up to the developer to manually create threads to go off and do any long-duration data processing work.

Excellent example... Let's say I have a UI with a button. When that button is pressed, I kick off some long process (say sorting 10M numbers?) that process is going to take say 10 seconds to run... The windowing toolkit might use it's own thread internally to draw the state changes of the button, but unless I write my sorting routine to execute in a thread, my program (and that program's UI) is effectively blocked from doing anything else, including responding to clicks, until that sorting process is done.

WELL written programs anticipate this long delay, and they do the processing in a thread and that thread then reports back to the applications/UI's main thread to say "Ok, I'm done". In the mean time, the application is responsive (or maybe it's showing a progress bar or something, but it's not totally hung/beachballing.)

This exact paradigm and issue exists in Windows, it exists in OSX, it exists in QT, it exists in Linux's window toolkits, it exists in Java... The ONLY fix is to write all your data processing code in threads... And in many of those systems that is a royal PAIN to do! So most developers don't bother unless it's something where they know the code being called will take a long time to run.

GCD changes all that... With the changes to Objective-C (blocks) it's STUPID easy to set up the long-running data processing code to execute in a thread. The whole point is that GCD takes care of figuring out how many threads to build, how to manage them, how to dynamically balance them across the processing resources available, etc. The whole point is to lower the barriers to using threaded code to the point that developers will just automatically do it since there's no real thought required to do it. And by doing so, all applications on the system become more responsive and snappier (note, I didn't say *FASTER*... They might not process data any quicker, but they'll stay responsive to user inputs and such better.)

How many of you have been doing a big encoding job in Handbrake that's eating the CPU so badly that you can't hardly do ANYTHING on your 2-4 core Mac? The whole system feels sluggish. Why? Because Handbrake sets up the threads itself, and it's chewing every cycle it can get, damn the rest of the system! GCD takes care of that. It keeps apps from going through thread starvation. It distributes the work being requested as optimally as possible across all the cores you have, and if you're trying to do something in a 2nd application and it needs a little CPU time to respond, GCD makes sure it gets it.

OpenCL - Eh, It has potential to open up additional computing resources that we have on our machines, but OpenCL requires a good bit more work on a developer's part to use than GCD, and it doesn't help in all instances. Large dataset jobs, where the data processing is independent of each other OpenCL will rock at. But that's usually a subset of what's typically done on a computer (often data and processes are interlinked together and can't be run in parallel.)
 
... as GCD will be keeping the number of threads at as optimal a level as possible, rather than machines losing time to threads that barely do anything.

I'll be polite, but this analysis is sorely lacking in understanding of threading mechanisms....

"Idle" threads don't "lose time". Looking at the half-dozen Windows systems on my desk (that includes virtual machines and remote desktops on servers in the raised floor lab) the range is about 600 to 2000 threads alive on each system, with the average around 1000. (Windows "Task Manager" tab for "Performance" lists the total threads running on the system. The IE8 task that I'm posting from has 108 threads. Perhaps an OSX user could report on the active thread count on OSX.)

There's little cost to idle threads in a well designed thread scheduler. Some kernel RAM to describe the thread - but not much more.

The "threading problem" is how to chop up the application algorithm into independent chunks that can run in parallel threads on the multiple CPUs found in most systems. It is not that "idle threads" are strangling the system.

"Grand Central" (and "ConcRT" (Concurrency RunTime) in Windows) are simply tools to make it easier for a programmer to identify those chunks and run them on multiple threads - that's why programs have to be rewritten to use the new threading models.

Note that GCD/ConcRT won't help much for programs that are already written to use multiple cores. Those have been written to identify and exploit parallelism using traditional threading models (although 10.6 has much better thread performance at the kernel primitive level, so these existing threaded apps may benefit from kernel improvements).


Yes, and no....

Very good post!
 
I'll be polite, but this analysis is sorely lacking in understanding of threading mechanisms....

"Idle" threads don't "lose time". Looking at the half-dozen Windows systems on my desk (that includes virtual machines and remote desktops on servers in the raised floor lab) the range is about 600 to 2000 threads alive on each system, with the average around 1000. (Windows "Task Manager" tab for "Performance" lists the total threads running on the system. The IE8 task that I'm posting from has 108 threads. Perhaps an OSX user could report on the active thread count on OSX.)

Well, I only have Safari, Mail, iTunes, last.fm running and heres what my activity monitor is reporting under CPU

Threads: ~280-/+

Processes: 56

Usage: >85% Idle / System 2-7% / User 3-10%

So nothing much going on at the moment
 
Also, forgot to mention, GCD works by managing a POOL of threads. It sets up the threads (which from what I've read, in OSX, are somewhat heavyweight objects with a lot of overhead, but still less than a process.) Blocks on the other hand basically say to GCD "Hey, here's some code I want run, it can be run in parallel, please schedule it on one of your threads and let me know when you're done." There's VERY VERY little overhead involved (think 10's of bytes rather than several 10's of Kbytes for a thread.) That reduced overhead means you can have a LOT more of them created and in memory at once and it's quicker to switch between them as well.

So, GCD may look at your system and say "I can support 200 threads with this hardware", but your active programs might have 10,000 blocks being passed around in GCD's scheduler to get time to run on those 200 threads.

Again. there's NO bookkeeping that the developer needs to do, the system just takes care of it for you.

I kind of like to think of it as a block-->thread scheduler similar to the OS's process-->process scheduler.
 
How many of you have been doing a big encoding job in Handbrake that's eating the CPU so badly that you can't hardly do ANYTHING on your 2-4 core Mac? The whole system feels sluggish. Why? Because Handbrake sets up the threads itself, and it's chewing every cycle it can get, damn the rest of the system! GCD takes care of that. It keeps apps from going through thread starvation. It distributes the work being requested as optimally as possible across all the cores you have, and if you're trying to do something in a 2nd application and it needs a little CPU time to respond, GCD makes sure it gets it.
Not too often once I got the Q6600 ages ago. I've been gaming, recording TV, and transcoding in HandBrake without skipping a beat in game in some instances. This was in Windows Vista mind you.

Well, I only have Safari, Mail, iTunes, last.fm running and heres what my activity monitor is reporting under CPU

Threads: ~280-/+

Processes: 56

Usage: >85% Idle / System 2-7% / User 3-10%

So nothing much going on at the moment
Just off hand I get about 260 threads, 60 processes on average under OS X. That's with a browser and iTunes open. Like you said not much going on.
 
Excellent example... Let's say I have a UI with a button. When that button is pressed, I kick off some long process (say sorting 10M numbers?) that process is going to take say 10 seconds to run... The windowing toolkit might use it's own thread internally to draw the state changes of the button, but unless I write my sorting routine to execute in a thread, my program (and that program's UI) is effectively blocked from doing anything else, including responding to clicks, until that sorting process is done.

WELL written programs anticipate this long delay, and they do the processing in a thread and that thread then reports back to the applications/UI's main thread to say "Ok, I'm done". In the mean time, the application is responsive (or maybe it's showing a progress bar or something, but it's not totally hung/beachballing.)
That has nothing to do with the GUI, and I can do a progress bar without threads. To be useful, your whole application model would have to be smart enough to expect work to be done in a background thread and handle the result of that work when it completes. Otherwise, using threads is pointless. That is not as trivial as you just described.
This exact paradigm and issue exists in Windows, it exists in OSX, it exists in QT, it exists in Linux's window toolkits, it exists in Java... The ONLY fix is to write all your data processing code in threads... And in many of those systems that is a royal PAIN to do! So most developers don't bother unless it's something where they know the code being called will take a long time to run.

GCD changes all that... With the changes to Objective-C (blocks) it's STUPID easy to set up the long-running data processing code to execute in a thread. The whole point is that GCD takes care of figuring out how many threads to build, how to manage them, how to dynamically balance them across the processing resources available, etc.
Yawn! Threads are stupid easy to create in Java and so are thread pools. GCD cannot make developers more intelligent about when to use threads or how many to construct. Last I checked, I didn't think threads are that difficult in C or C++ either. Blocks constructs may help slightly with avoiding some concurrency issues, but what I've seen so far hasn't been that exciting.
The whole point is to lower the barriers to using threaded code to the point that developers will just automatically do it since there's no real thought required to do it.
No thought required to threads? THAT'S REALLY SCARY.

How many of you have been doing a big encoding job in Handbrake that's eating the CPU so badly that you can't hardly do ANYTHING on your 2-4 core Mac? The whole system feels sluggish. Why? Because Handbrake sets up the threads itself, and it's chewing every cycle it can get, damn the rest of the system!
OS X must've had a really crappy kernel until last month.
 
OS X must've had a really crappy kernel until last month.
Don't worry too much about it. Evey time a new version of OS X is announced everyone seems to claim it is going to be our salvation.

Well something along those lines. I stopped caring after Spotlight and Time Machine.
 
Yawn! Threads are stupid easy to create in Java and so are thread pools. GCD cannot make developers more intelligent about when to use threads or how many to construct. Last I checked, I didn't think threads are that difficult in C or C++ either. Blocks constructs may help slightly with avoiding some concurrency issues, but what I've seen so far hasn't been that exciting.

Yes, it is easy (and yes, I am a Java developer, and I know it's easy to create threads.) It's NOT easy, however, to manage them effectively in a large Swing application... Hence why the SwingWorker system came out... When I read about Blocks and GCD, I immediately thought "This is they're version of SwingWorker!" and in a lot of ways it IS.

And as I said (and many others have as well) not all programs processing jobs can be broken up into multiple threads in order to run faster, but by using threads to handle the whole "go do this when I press this button, and do it off the UI thread so the UI can remain responsive" the whole system is snappier and better able to respond and adjust resources as needed.

Note that Java or C++ or any of the other language specific threading systems don't really do anything to keep you from creating a crap load of threads that aren't really doing anything. Threads might not take CPU time if they're idle, but they DO take up memory, and they DO take up space in the thread management/task scheduler.


No thought required to threads? THAT'S REALLY SCARY.

You're misquoting me... It makes it STUPID EASY to dispatch processing work off to threads (like what I said with the processing work invoked by the button press.) It does NOT make it "stupid easy" to redesign your program to do concurrent data processing... That still takes work. But blocks and GCD remove some of the overhead boilerplate type work that developers usually have to do to create and manage threads on their own.

The point is to lower the barriers to using these tools so more developers will put in the effort to use them, rather than think "This is too much work, screw it, it's not like this chunk of code will take more than a second to do." (forgetting that that chunk of code might need to wait on a network connection or diskIO or something like that which could get blocked or stalled, making the whole app hang...)
 
Um...



I agree. People are acting like developers have never thought about threads before. GCD makes creating and managing threads easier (also keeps a program from stepping all over the toes of another program when it comes to threads), but it will not suddenly make non-threadable programs threadable or make developers good and programming with threads and shared resources. To start, there has to be work that can run in parallel in the program. The standard GUI program of click, wait for response, click again doesn't have very much that can be useful for threads.

The big win comes in tasks like encoding/decoding. In those cases developers have already been using threads for a very long time, but they have probably been conservative on the number of threads created (too few and you don't use the procs, too many and context switching kills you). Now they can request all they want and let GCD manage how many threads to create dynamically based on the system the code is executing on.

I believe that is the point. Simple things that CAN benefit from threads are frequently NOT threaded because you have to do quite a lot of thread management as soon as you start messing with them.

For instance, the simple act of doing "the same thing", let's say something that takes 2 seconds on each item in a collection, typically gets done along the lines of...

for (int i = 0 ; i < [collection count] ; i++) {
[[collection itemAtIndex:i] doTwoSecondTask];
}

Most developers would leave it at that, but with blocks and GCD you can literally rewrite that as...

queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply(count, queue, ^(size_t i) {
[[collection itemAtIndex:i] doTwoSecondTask];
}); // I didn't test this, I'm cribbing from the ADC documentation.

Now you have the same number of lines of code, a slightly different syntax, but your 2 second task is being done in parallel, with GCD deciding how many can be run at a time based on what your overall system is doing.

Remember, GCDs job is NOT to make everything run ultra parallel. It's job is to the make the overall SYSTEM continue to be responsive by being aware HOLISTICALLY of what is going on. Even if you thread pool yourself in your own app, which is somewhat time consuming to plumb in, you are only ever aware of what YOUR APP is doing. GCD sees the bigger picture...

So while it might speed some things up, I think it's big win is that it makes it EASY to thread things that otherwise would not be threaded without writing lots of thread pool management code. OK so you could use NSOperation, but that's not as holistic as GCD. There are lots and lots of "quick wins" you get with GCD, and that's what makes it applicable across the board.

Just my 2c. :)
 
Also, forgot to mention, GCD works by managing a POOL of threads. It sets up the threads (which from what I've read, in OSX, are somewhat heavyweight objects with a lot of overhead, but still less than a process.)


Again, this analysis is sorely lacking in understanding of threading mechanisms....

A "thread" is a subset of a process. A thread has the same virtual memory environment as the process that contains it. You can't have a global pool of threads, you can only have process pools.

In Windows, a "process" is a description of virtual memory and system context. It cannot execute on its own. A "thread" is an execution context within a process - it contains the dynamic state. A process must have at least one thread - otherwise it would have no value.

About Processes and Threads

Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.

A thread is the entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients.

Microsoft Windows supports preemptive multitasking, which creates the effect of simultaneous execution of multiple threads from multiple processes. On a multiprocessor computer, the system can simultaneously execute as many threads as there are processors on the computer.

A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. Operations performed on the job object affect all processes associated with the job object.

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. An application can switch between UMS threads in user mode without involving the system scheduler and regain control of the processor if a UMS thread blocks in the kernel. Each UMS thread has its own thread context instead of sharing the thread context of a single thread. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls.

A fiber is a unit of execution that must be manually scheduled by the application. Fibers run in the context of the threads that schedule them. Each thread can schedule multiple fibers. In general, fibers do not provide advantages over a well-designed multithreaded application. However, using fibers can make it easier to port applications that were designed to schedule their own threads.

For more information, see the following topics: link

The system has processes, processes have threads, and threads have fibers.


(This is a Windows-centric description, but a "process" is a hardware entity that is defined by the CPU. I've used a number of threading models, but they've all followed this general model (except, of course, old Linux systems that simulated threads using processes). Please, any OSX developers point out where OSX varies from this model.)
 
GCD already doing good things for apps.

Many properly written apps are already benefit from the SL / GCD combo. Apeture is a good example of an app that is much better running under SL without being written to take advantage of GCD and OpenCL expressly. In a nut shell the work that went into GCD is haveing a big payoff for threaded apps.

Now this isn't to say Apeture or anything else has been optimized for SL. Just that SL giveS multi threaded apps more respectable behavior.

In any even, getting back on track, I have to agreethat focusing on the core encoding routines is in order. This would have a huge pay off for everybody, especially those encodings amendable to parallel programming.

Asto those surprised by these numbers I have to ask wherehave you been. There is much that could be accellerated via these techniques. This is why I jumped at Snow Leopard the day it came out. Not because I expected immediate pay off but rather because once a standard way of doing things becomes available it will be adopted wide scale. Well I have to admit that there was a bit of expectation for an immediate pay off but that has a lot to do with knowing how bad the Macs threading model was with respect to things like Linux. The fact remains it is all downhill from here, as more and more libraries and programs get updated to the new tech our macs will just get faster.

I've seen many complaints about SL in the forums but I must say I'm happy. Part of that is due to the very noticeable improvement to existing apps. That say a lot about the mechanisms over which GCD and OpenCL are implemented. It really makes one wonder how programs like Apeture will fare when accelerated purposefully.

Dave
 
Not OpenCL as in my MacPro's case, the ATI X1900 XT isn't supported.

You need to have a specific card for OpenCL to be used

I am in the same boat. I 'll have to buy overpriced card (overpriced compared to the Windows same cards) for mu Mac Pro to taker advantage of CL. I am still pissed that NVidia & ATI still charge way more for the Mac the same price they make for Windows machines. I expect to pay about $50 dollars for EFI capability but the $100 + doesn't make much since in my book.
 
That is just wrong.

You (and most other people) need to realize that there is a relatively small set of computations that can be accelerated by this kind of technology. Of course certain types of video, image, and sound processing will work, but your run-of-the-mill Mac app isn't going to be able to take advantage of GCD or OCL.

The ability to use these facilities effectively is up to the programmer and the algorithms he is working with. Many apps benefit from threading GCD just provides for a way to leverage a bunch of CPUs in a different way than normal threads or the use of NSOperation.

Dave
 
Again, this analysis is sorely lacking in understanding of threading mechanisms....

Believe me, I know ALL about the differences between processes, threads and (MS calls them Fibers, I forget what Apple is calling them.) Hard to get a Computer Science degree from Georgia Tech without learning that ;) I'm just simplifying things for the more layperson reader here...

Really, anyone who wants to get a better understanding of what GCD does and how it works should go read the ArsTechnica Snow Leopard review, pages 11-15 He did a FANTASTIC job of describing and illustrating how GCD and OpenCL work, what they do, and why they're helpful (but not some magic bullet like some people seem to expect...)
 
Many properly written apps are already benefit from the SL / GCD combo. Apeture is a good example of an app that is much better running under SL without being written to take advantage of GCD and OpenCL expressly. In a nut shell the work that went into GCD is haveing a big payoff for threaded apps.

Now this isn't to say Apeture or anything else has been optimized for SL. Just that SL giveS multi threaded apps more respectable behavior.

Dave

In reality, the speed ups you're experiencing are probably only because parts of the OS code have been rewritten to use GCD, and thus respond faster than they used to. IE: the underlying system libraries might be better/more threaded than before. But Apeture itself hasn't been re-written/modified to use it itself. Once it is, you'll probably see even more performance improvements.
 
Many properly written apps are already benefit from the SL / GCD combo.

...that has a lot to do with knowing how bad the Mac's threading model was with respect to things like Linux.


Agree - but these apps aren't benefitting from Grand Central per se, they're helped by work that Apple had to do to fix some fundamental brain damage in the 10.5 and earlier threading code.

Any app using pthreads or other ways of managing threads will see an improvement - but it is not because they use GCD. It is because Apple had to improve the sorry state of threading in OSX for GCD.


In reality, the speed ups you're experiencing are probably only because parts of the OS code have been rewritten to use GCD, and thus respond faster than they used to. IE: the underlying system libraries might be better/more threaded than before. But Apeture itself hasn't been re-written/modified to use it itself. Once it is, you'll probably see even more performance improvements.

Or, because parts of the OS haven't been rewritten to use GCD - but they use the improved thread primitives.

It would be interesting to see comparisons of pthread performance in 10.5 and 10.6. If pthreaded apps run better in 10.6, then it's not GCD - but the foundation work done for GCD.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.