OpenCL Benchmarks and Support for Both MacBook Pro GPUs

JFreak · Aug 31, 2009

doctoree said:
So Please A Companies (Adobe, Apple, Autodesk) give us full Open CL apps!

+1

randyhudson · Aug 31, 2009

Am I the only one wondering why an ATI 4870 reports only 4 "units" instead of 800?

Digitalclips · Aug 31, 2009

doctoree said:
2. No real negative effect. The temps and fan speeds will be comparable to high end gaming. The hardware is built to deal with this.
3. It was stated to be quite easy because it is hardware independent and easy to code but only Apps that are working hughly in parallel threads can take full advantage of the GPU. Apps with a lot of syncking and dependencies in their threads will remain CPU apps for the forseeable future.
4. Probably sooner than OpenCL but the advantages will be MUCH tinier.
5. ID Software is using Open CL in Rage so YES but I doubt the benefits will be HUGE because the GPU is already working hard in Games nowadays.

So Please A Companies (Adobe, Apple, Autodesk) give us full Open CL apps!

Are any current apps in the pro area gaining speed from Snow Leopard at the moment or are we waiting for updates.

JFreak · Aug 31, 2009

Digitalclips said:
Are any current apps in the pro area gaining speed from Snow Leopard at the moment or are we waiting for updates.

Overall snappiness, yes, taking advantage of new tech, no.

clancemasterj · Aug 31, 2009

MBP 4,1 2.4Ghz 15"

Code:

...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 8600M GT
Device 0 is an: GPU with max. 940 MHz and 32 units/cores 
Now computing - please be patient....
time used:  2.369 seconds

OpenCL Device # 1 = Intel(R) Core(TM)2 Duo CPU     T8300  @ 2.40GHz
Device 1 is an: CPU with max. 2400 MHz and 2 units/cores 
Now computing - please be patient....
time used: 15.873 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :)

Why did the 8600M GT beat the unibody's 9600M GT?

CQd44 · Aug 31, 2009

clancemasterj said:
Why did the 8600M GT beat the unibody's 9600M GT?

If it makes you feel better, a 9600 is little more than a rebranded 8600.

MacFly123 · Aug 31, 2009

JFreak said:
Probably 3ish years, it needs to be written in Cocoa first, and while doing so taking advantage of the new API's...

I thought that is what they did to FCS in this new version and that was why it took so long. So that didn't happen?

If that is true, I'm even more underwhelmed at the latest update then!

fleshman03 · Aug 31, 2009

CQd44 said:
If it makes you feel better, a 9600 is little more than a rebranded 8600.

Ah yes. But does the 9600 have the feature of bursting into flames? +1 for the 8600.

Here's my benchmarks.

Code:

...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 8600M GT
Device 0 is an: GPU with max. 940 MHz and 32 units/cores 
Now computing - please be patient....
time used:  3.056 seconds

OpenCL Device # 1 = Intel(R) Core(TM)2 Duo CPU     T9300  @ 2.50GHz
Device 1 is an: CPU with max. 2500 MHz and 2 units/cores 
Now computing - please be patient....
time used: 14.710 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :) 
logout

[Process completed]

TripHop · Aug 31, 2009

Early 2008 17" MacBook Pro4,1 HD C2D T9500 Penryn @ 2.6GHz GeForce 8600M GT

Early 2008 17" MacBook Pro4,1 HD C2D T9500 Penryn @ 2.6GHz GeForce 8600M GT

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 8600M GT
Device 0 is an: GPU with max. 1040 MHz and 32 units/cores
Now computing - please be patient....
time used: 2.372 seconds

OpenCL Device # 1 = Intel(R) Core(TM)2 Duo CPU T9500 @ 2.60GHz
Device 1 is an: CPU with max. 2600 MHz and 2 units/cores
Now computing - please be patient....
time used: 13.999 seconds

deconstruct60 · Aug 31, 2009

randyhudson said:
Am I the only one wondering why an ATI 4870 reports only 4 "units" instead of 800?

Because "stream processors" (and in some cases when limit to fix functionality, "pixel shaders" ) are not cores. Think of them more as Altivec/VMX/SEE functional units. Great at math and bit-twiddling tasks on vectors, but only take those specific kind of commands.

http://www.tomshardware.com/forum/252674-33-stream-processors
http://en.wikipedia.org/wiki/Shader_(computer_science)

The newer shaders are "programmable" and are tweaked slightly to be more "general" CPU cores. The streams just digest pixels in much more limited ways ( here are some bits and do things you "already know how to do" to them. ) Besides they are very inflexible they are smaller so can drop 800 of them on the die. (can forgo general registers .. etc . etc. )

The benchmark does more than just add three vectors. It loops over the three vectors adding them. If there is no "if then branch" functionality in the "stream processor" then it is not what is driving the work. Cores have a full complement of commands (logical branch, tests , exceptions, load stores ... in addition to math and twiddling. )

Would be interesting if OpenCL what got reported by a Cell processor package. The Cell processor with its one Power based core and several SPE's. However, SPE's do their own memory load/store and etc on their own. I guess in the OpenCL context would "ignore" the PPC core and just report the SPE. ( since they get different binaries anyway. )

Digitalclips · Aug 31, 2009

JFreak said:
Overall snappiness, yes, taking advantage of new tech, no.

Thanks for reply. So quite a few updates from Apple must be coming soon, they of all people should have their apps currently being updated to take advantage of both Parallel processing and OpenCL ... I wonder when?

JFreak · Aug 31, 2009

Digitalclips said:
So quite a few updates from Apple must be coming soon, they of all people should have their apps currently being updated to take advantage of both Parallel processing and OpenCL ... I wonder when?

We have been waiting for properly written (cocoa) version of FCP for longer than I can remember... don't hold your breath.

Bregalad · Aug 31, 2009

THX1139 said:
Well, not small, inexpensive laptops... (do those even exist?), but my Macbook Pro 2.53 with Snow Leopard renders Final Cut Pro footage faster than my 2006 Mac Pro 2.66 that is running on 10.5.8. Of course, if I add Snow Leopard to my Mac Pro, it should even out or be faster. But yeah, this is a good time to have a laptop- it's going to be even better when the apps are coded to optimize Snow Leopard. I never thought I'd see the day a laptop would run faster than my desktop, but it looks like it's here. I might be able to dump the ol' ball and chain pretty soon.

Take new technology and then compare new hardware with 3 year old hardware. Express surprise when new hardware wins. woohoo.

Now try the same thing with new hardware and you'll see the desktop wins by a heathy margin. If Apple actually made desktops (instead of iMacs and minis) the margin of victory would be huge.

What this is really illustrating is what a small but vocal minority has been saying about Apple for years: they ship the crappiest GPUs they can get away with. When new technology comes along that can make use of the GPU almost none of their older machines can take advantage of it. Even the Mac Pro from last year ago shipped with junk that's not compatible with OpenCL.

merlintl · Aug 31, 2009

deconstruct60 said:
Because "stream processors" (and in some cases when limit to fix functionality, "pixel shaders" ) are not cores. Think of them more as Altivec/VMX/SEE functional units. Great at math and bit-twiddling tasks on vectors, but only take those specific kind of commands.

So, does this mean that for complex Open CL operations, the current set of nVidia cards (even low end) will blow away a ATI 4870 since it uses stream processors? That would be a disappointing for Open CL uses. The ATI 4870 card is a good card in general but maybe not for Open CL?

gauchogolfer · Aug 31, 2009

If this can be demonstrated reasonably soon in Matlab and Mathematica I might be able to move away from the HP workstation I've got at my desk and get a MP for the heavy lifting. I've been looking at a plugin package for Matlab called Acclereyes that does CUDA code porting for nVidia GPU acceleration, but it's about a $1000 plugin and I'm not sure about its long-term viability/support. This would be a nice way to go if possible.

randyhudson · Aug 31, 2009

deconstruct60 said:
Because "stream processors" (and in some cases when limit to fix functionality, "pixel shaders" ) are not cores. Think of them more as Altivec/VMX/SEE functional units. Great at math and bit-twiddling tasks on vectors, but only take those specific kind of commands.

Something still doesn't add up. I don't see how an integrated NVidia GPU could have 16 of anything that a high-end discrete GPU from ATI doesn't have more of. ATI has been offloading video encoding to their GPUs since 2004, so this isn't anything new.

It sounds more like a poorly-written driver (or benchmark) that isn't taking advantage of the ATI GPU.

2002cbr600f4i · Aug 31, 2009

randyhudson said:
Something still doesn't add up. I don't see how an integrated NVidia GPU could have 16 of anything that a high-end discrete GPU from ATI doesn't have more of. ATI has been offloading video encoding to their GPUs since 2004, so this isn't anything new.

It sounds more like a poorly-written driver (or benchmark) that isn't taking advantage of the ATI GPU.

The problem is you're comparing Apples to Oranges (no pun intended). NVidia and ATI, while supporting the same DirectX/OpenGL/OpenCL API's, do things VERY differently in hardware. It's not that the NVidia GPU has 16 of the same thing that a high end ATI card has. They are, in fact, very very different animals.

What we can't tell yet, at least from this "benchmark" (and I cringe to even call it that) is just how much better one manufacturer's design is over the other for OpenCL. Certainly it appears that the NVidia chips are better for this sort of task, but this code isn't really realistic enough to prove it.

I think we'll see more OpenCL support and better examples of code that really stresses the cards more.

Also, keep in mind that MS is introducing the concept of "Compute-Shaders" in DirectX 11, which I believe is coming with Windows 7 (or due within the next 12 months). That concept is VERY similar to OpenCL, except, like everything Microsoft, proprietary. So, if ATI really is behind on performance, you can bet that the loads of Windows users complaining about it will force ATI to improve their cards rapidly, and that SHOULD translate into better OpenCL performance as well.

Time will tell folks!

dergaderg · Aug 31, 2009

3 Year old MBP

i just tested this with my 3 year old santa rosa mac book pro and i got
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 8600M GT
Device 0 is an: GPU with max. 1040 MHz and 32 units/cores
Now computing - please be patient....
time used: 2.370 seconds

OpenCL Device # 1 = Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
Device 1 is an: CPU with max. 2400 MHz and 2 units/cores
Now computing - please be patient....
time used: 15.545 seconds

Now checking if results are valid - please be patient....

Validate test passed - GPU results=CPU results

logout

[Process completed]

jkleemann · Aug 31, 2009

This test is of very limited value

I wondered why ati cards seem to suck so badly so i took the ao-test (basic raytracer) and tried it on my 4870 (on macpro w. 8 cores). The gpu-test failed ("cannot find function") on the ATI card. So i guess the ATI-drivers are still beta or just flawed.

The nice thing about the ao benchmark is that it reads a cl file instead of having the ocl code inside the binary so i could easily change it.

I changed the code to a simple add loop (each kernel loops 10000) adds. my macpro outperformed the gpu.

I changed the code to a more common multiply&add unrolled (10 times) which made the gpu 3 times faster than the 8 cores (@3.0Ghz). So i guess the benchmarks show us 2 things right now:

1.) openCL works
2.) ATI cards need better support (incompatibility between Nvidia and ATI cards). This is also true for the examples (perlin noise,...) which can be downloaded from apple - they just not work on my 4870 card.
3.) for simple add loops the higher number of cores on nvidia cards rule

Sander · Sep 1, 2009

jlasoon said:
I want that GTX 285. Now how do I convince the wife?

"Honey, I'm thinking I should spend less time at the computer and more time with you and the kids. Now I think I found a way to make the computer finish the work faster..."

MythicFrost · Sep 1, 2009

Could this mean Mac games could use multiple graphics cards? similar to SLI or CrossFire?

Kind Regards

MorphingDragon · Sep 1, 2009

MythicFrost said:
Could this mean Mac games could use multiple graphics cards? similar to SLI or CrossFire?

Kind Regards

XFire and SLI is hardware based.

MorphingDragon · Sep 1, 2009

Kaptajn Haddock said:
It's ridiculous that my one year old iMac with Radeon HD 2600 Pro is not supported.

The Radeon 2XXX arent capable of Double Precision calculations!!!

DO WE NEED TO SET UP A BLOOMIN RECORDING!?

Cant wait to learn OpenCL though, but I need an Intel Mac first. :apple:

AussieDSW · Sep 1, 2009

001 said:
This is very cool and interesting. On battery power my late 2008 15" MBP's 9400M beats up on the 9600M. But once plugged in the 9600M trounces the 94, without regard or regret.

Check it:
Battery

Code:

OpenCL Device # 0 = GeForce 9600M GT time used: 13.622 seconds OpenCL Device # 1 = GeForce 9400M time used: 9.022 seconds OpenCL Device # 2 = Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz Device 2 is an: CPU with max. 2800 MHz and 2 units/cores time used: 13.102 seconds

Plugged in

Code:

OpenCL Device # 0 = GeForce 9600M GT time used: 2.788 seconds OpenCL Device # 1 = GeForce 9400M time used: 9.028 seconds OpenCL Device # 2 = Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz time used: 13.183 seconds

I got very similar results with my Late 2008 Macbook Pro!

Why does the 9600M GT run at less than 10% of its performance when running on battery?

This seems to mean that it may be actually better performance to use the 9400M over the 9600M GT when running on batteries.

Here's my results:

Battery

Code:

Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 9600M GT
Device 0 is an: GPU with max. 1250 MHz and 32 units/cores 
Now computing - please be patient....
time used: 13.466 seconds

OpenCL Device # 1 = GeForce 9400M
Device 1 is an: GPU with max. 1100 MHz and 16 units/cores 
Now computing - please be patient....
time used:  8.986 seconds

OpenCL Device # 2 = Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz
Device 2 is an: CPU with max. 2400 MHz and 2 units/cores 
Now computing - please be patient....
time used: 15.484 seconds

Plugged in

Code:

Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 9600M GT
Device 0 is an: GPU with max. 1250 MHz and 32 units/cores 
Now computing - please be patient....
time used:  2.804 seconds

OpenCL Device # 1 = GeForce 9400M
Device 1 is an: GPU with max. 1100 MHz and 16 units/cores 
Now computing - please be patient....
time used:  9.028 seconds

OpenCL Device # 2 = Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz
Device 2 is an: CPU with max. 2400 MHz and 2 units/cores 
Now computing - please be patient....
time used: 15.501 seconds

friede · Sep 1, 2009

AussieDSW said:

I got very similar results with my Late 2008 Macbook Pro!

Why does the 9600M GT run at less than 10% of its performance when running on battery?

This seems to mean that it may be actually better performance to use the 9400M over the 9600M GT when running on batteries.

Here's my results:

Battery

Code:

Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 9600M GT
Device 0 is an: GPU with max. 1250 MHz and 32 units/cores 
Now computing - please be patient....
time used: 13.466 seconds

OpenCL Device # 1 = GeForce 9400M
Device 1 is an: GPU with max. 1100 MHz and 16 units/cores 
Now computing - please be patient....
time used:  8.986 seconds

OpenCL Device # 2 = Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz
Device 2 is an: CPU with max. 2400 MHz and 2 units/cores 
Now computing - please be patient....
time used: 15.484 seconds

Plugged in

Code:

Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 9600M GT
Device 0 is an: GPU with max. 1250 MHz and 32 units/cores 
Now computing - please be patient....
time used:  2.804 seconds

OpenCL Device # 1 = GeForce 9400M
Device 1 is an: GPU with max. 1100 MHz and 16 units/cores 
Now computing - please be patient....
time used:  9.028 seconds

OpenCL Device # 2 = Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz
Device 2 is an: CPU with max. 2400 MHz and 2 units/cores 
Now computing - please be patient....
time used: 15.501 seconds

When my MBP 13" is plugged in, my 9400M only needs 3.5 seconds...

OpenCL Benchmarks and Support for Both MacBook Pro GPUs

macrumors 68040

Suspended

macrumors 65816

macrumors 68040

macrumors regular

macrumors 6502a

macrumors 68020

macrumors 68000

macrumors regular

macrumors G5

macrumors 65816

macrumors 68040

macrumors 6502

macrumors newbie

macrumors 603

Suspended

macrumors 6502

macrumors member

macrumors newbie

macrumors 6502a

macrumors 68040

macrumors 603

macrumors 603

macrumors newbie

macrumors newbie

Our Staff