CUDA alternatives I have tried on my MacBook Pro with 2.4GHz/8600M GT go from 20fps to 150fps for 720p.I've never seen an answer to this type of question, not even a ballpark figure....
Has anyone read what kind of speed up one might expect for h.264 video ENcoding using OpenCL on a MacPro that was maxed out with the standard-class video cards Apple offers?
just broad speedup. 10x? 100x?
Or is h.264 encoding not parallelizable enough to actually see much of a boost?
Let's be clear here, Grand Central Dispatch does not bring any performance improvements. It is just a library to simplify threading for some who might not otherwise do multi-threaded programming. It's the multi-threading that brings the performance improvements.
These cards were sold to you for rendering video... they just aren't capable of doing the calculations. It was normal for the engine of a graphics card to not support double precision floating point, or bastardize IEEE754 floating point numbers, they were made to render graphics fast. If you wanted to be able to be able to do math calculations on your card, then you should have made sure to get a CUDA capable graphics card.What's GCD like on dual-core Macs that can't use OpenCL?
I object to Apple not making OpenCL drivers for these machines because I think they would benefit the MOST.
Theoretical situation...
8-Core Mac Pro... video encoding... 4 seconds (Leopard)
8-Core Mac Pro... video encoding... 2 seconds (Snow Leopard)
2-Core iMac/MacBook Pro... video encoding... 10 minutes... (leopard)
2-Core iMac/MacBook Pro... video encoding... 10 minutes... (Snow Leopard)
2-Core iMac/MacBook Pro... video encoding... 5 minutes... (Snow Leopard if they made a damn driver!)
We're talking 1.5-2 year old computers here, and since Apple's ripped out ANY support for PPC machines, one would think they could support the last 4 years of Intel machines... one would think?
Here we're also talking about users who potentially have less money, and can't afford to upgrade their hardware as much... so do so every 5 years or so. These guys still pay to upgrade the software, and are interested in new technology... they also want the biggest bang for their buck.
So Mac Pro users... 4 seconds or 2 seconds... who cares? Me... 10 minutes or 5 minutes... I care because it's a lot of time! And for those who say "you're a cheap bastard..." my MBP cost significantly more than most Mac Pro's!
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.Grand Central does have some features you can't just implement inside an application in Leopard. Grand Central manages a threadpool system-wide, which allows threadpools to be used very cheaply(CPU cost wise) and allows the system to maintain an optimal number of worker threads for the CPU cores available.
The overhead of creating threadpools for large applications is minimal, but the cheap threadpools for smaller applications, or applications that may not need a threadpool often is substantial.
The article you linked has almost nothing to do with global thread pools. Please state just ONE specific advantage global thread pools provide over one I'd create locally (for a desktop application).This article is for .Net threadpools, but figure2 at http://msdn.microsoft.com/en-us/magazine/dd252943.aspx illustrates the problem of maintaining an optimal number of concurrent threads. Too few threads/core and performance plummets. Too many threads/core and performance starts heading south again. I believe OSX is the first OS to integrate global threadpools.
What are you calling a "resource" here? Cores? Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?If you are running only one large application, then Grand Central might not give you an advantage. Grand Central is pretty neat in how it automatically allocates resources without overwhelming the computer.
CUDA alternatives I have tried on my MacBook Pro with 2.4GHz/8600M GT go from 20fps to 150fps for 720p.
http://badaboomit.com/ is what I use. They claim 20x.
If your UI stalls waiting for a task to complete, you can run the task separately while the UI keeps chugging along leaving the user free to do other tasks in the UI while waiting. For example, when a tab stalls in Safari, wouldn't it be great to switch to another tab and get some work done if one of them stalled? Also you are not creating the thread pool, GCD takes care of that. You just specify how tasks are split (arguably the hard part of multi-core programming) and how they interact and GCD manages it all. Kinda like how you don't have to micromanage all the trains at the train station when you have the workers there to do it for you.Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.
Global thread pools (as used by GCD) know the state of the system. This is stated as a specific example of the advantage of GCD - it will optimize the number of threads to reflect system load.The article you linked has almost nothing to do with global thread pools. Please state just ONE specific advantage global thread pools provide over one I'd create locally.
Yes. Actually it's not asking you to think that low level. GCD says, "Give me a task that can be broken up into blocks, and I'll figure out the best way to do it given current system resources." Namely CPU and GPU. Memory is already managed in a preemptive multitasking system. Why rewrite the memory manager when you have a good one already?What are you calling a "resource" here? Cores? Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.
The article you linked has almost nothing to do with global thread pools. Please state just ONE specific advantage global thread pools provide over one I'd create locally (for a desktop application).
What are you calling a "resource" here? Cores? Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?
Yes. Actually it's not asking you to think that low level. GCD says, "Give me a task that can be broken up into blocks, and I'll figure out the best way to do it given current system resources." Namely CPU and GPU.
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.
The article you linked has almost nothing to do with global thread pools. Please state just ONE specific advantage global thread pools provide over one I'd create locally (for a desktop application).
What are you calling a "resource" here? Cores? Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?
Why in the world would I want to create a thread pool when I don't need one!?!?!? Please give just ONE example of why a thread pool would be needed in a desktop application that does not provide server services or benefit from significant parallelism in a complex operation.
The article you linked has almost nothing to do with global thread pools.
Please state just ONE specific advantage global thread pools provide over one I'd create locally (for a desktop application).
What are you calling a "resource" here? Cores?
Does GCD have any knowledge of other resources on the system? How does GCD know what system resources my thread needs and when it will need them? Is it really that smart?
I'm a bloody programmer; I didn't say "idle" threads, I said threads that barely do anything. There's a difference. The cost of a context switch just to perform a trivial operation can be huge, and in user interfaces, and many other common cases you can get a lot of these types of operations.I'll be polite, but this analysis is sorely lacking in understanding of threading mechanisms....
"Idle" threads don't "lose time".
CUDA alternatives I have tried on my MacBook Pro with 2.4GHz/8600M GT go from 20fps to 150fps for 720p.
http://badaboomit.com/ is what I use. They claim 20x.
I have to disagree as I believe Apple refactored NSOperation and related stuff to run on top of the new threading architecture. It is the only viable explanation I have for some of the speed ups seen in older code. We maybe saying the samething but I'm specificaly saying that the infrastructure in place in SL to support GCD has had an excellent impact on existing software. Well existing software that took advantage of Apples NS threading primitives.
Well yah a rewrite can always speed things up, I'm just saying there has been a positive impact on existing software. That has a lot to do with the infrastructure put in place for GCD.
Dave
I will have to continue to object to this idea, well written apps already benefit from SL from what I can see. In some cases I would seriously doubt the developers will do anything more to optimize specifically for SL.The tricky thing is that the new technology underling Snow Leopard is not going to make a difference straight away, but needs to be implemented by developers.
Well it's been awhile since reading that article but I don't remember that being said. Yes GCD does the load balancing but it is still up to the programmer to find an optimal way to parallize the algorithms being used. So while the details of handling micro threads is not an issue there is no change in the efforts required to find the parallel code.Those who read the ArsTechnica review of Snow Leopard might believe the reviewer that Grand Central technology is relatively easy to implement into the code, because rather than creating threads on your own one can basically command to "handle things over to GCD" and it will do the rest of balancing the CPU-Power between processes and cores and all this tech things.
Well anyhow, it doesn't matter too much now, we will see benefits here over time though, be patient.
I guess you're making the mistake in thinking they dropped PPC to save time. No, they dropped PPC to screw customers. And that's what they are doing to you, too. This won't stop until customers stop excusing Apple for this kind of behavior. Of course, they dropped 'Computer' from their name because they make toys now.We're talking 1.5-2 year old computers here, and since Apple's ripped out ANY support for PPC machines, one would think they could support the last 4 years of Intel machines... one would think?
That makes you a poor customer.Here we're also talking about users who potentially have less money, and can't afford to upgrade their hardware as much... so do so every 5 years or so. These guys still pay to upgrade the software, and are interested in new technology... they also want the biggest bang for their buck.
Next time, don't give Apple your money unless they promise you a certain number of years of support.So Mac Pro users... 4 seconds or 2 seconds... who cares? Me... 10 minutes or 5 minutes... I care because it's a lot of time! And for those who say "you're a cheap bastard..." my MBP cost significantly more than most Mac Pro's!
This really appears to be a very good thing for apps making use of Apples higher level threading APIs. For people programming with those APIs there may be little incentive to go to the low level of the GCD primitives.Yup Dave, we're saying the same thing... It's not that the app has gotten faster just because of GCD. It's that the OS support that the App makes use of has gotten faster because it's been rewritten to use GCD. So, yes, SL can run some things faster without apps having to be rewritten.
If they are modified. The thing is if your app is 20 to 50% faster using the high level Cocao threading features then maybe the developer will spend his time on other parts of the app.There is SOME benefit. We won't see the rest of the benefits until the apps are modified to directly use GCD themselves.
I'd like to see figures for the independent contributions of:
- SnowLeopard running non-optimised code
- SL with OpenCL
- SL with GCD
- SL with both optimisations
Wonder if there's more detailed information anywhere?
On the otherhand cooling the idea that some have in this thread that all refactored programs will suddenly run much faster than todays version is in order. Here I'm not talking so much to you as to the individuals that seem to believe that a year from now we will all be shocked by how fast some apps run. For some apps you won't see much more than what we currently see in speed increases.
Dave