OpenCL Benchmarks and Support for Both MacBook Pro GPUs

Shawn Parr · Aug 30, 2009

2002cbr600f4i said:
Did you not even bother to read the article? ...

2002cbr600f4i said:
My bad, I my brain must have put the "not" in there in the part about BOTH GPUS CAN (not) BE USED". I'm man enough to admit when I'm wrong!

I just wanted to say, in the pseudo anonymity of the internet very few have the cojones to actually admit when they were wrong. I really appreciate it, and wanted to say thanks.

TheIguana said:
Update: I just switched over the 9400m on my 2008 MBP and tried it again. As you can see it only ran the test on the 9400m and not the 9600m. So it looks like OpenCL only works with both GPUs on the 2008 models when you are running the 9600m GPU.

Intriguing. I'll have to try this in a bit. I was a bit disappointed that I couldn't use both on this beast of a machine...

Shawn Parr · Aug 30, 2009

DAMNiatx said:
come on, you never try this thing.
Snow Leopard can use BOTH GPU at a time. no need to log out anymore.
just change to performance mode. you can use both gpu with opencl

That's not true on my machine, I just had to log in and out both times when switching graphics chips.

Chop69 said:
I was trying to figure this out as well, because I was getting results similar to 001's, but now I'm getting results similar to yours. The only thing that changed was when I got the lower 9400 score, my MBP was plugged in, but not fully charged. When the charge completed, I got a 9400 score in the 3 sec range.

I saw a big difference in my 9600 scores plugged vs unplugged:

unplugged:

Code:

Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 9600M GT
Device 0 is an: GPU with max. 1250 MHz and 32 units/cores 
Now computing - please be patient....
time used: 13.587 seconds

OpenCL Device # 1 = GeForce 9400M
Device 1 is an: GPU with max. 1100 MHz and 16 units/cores 
Now computing - please be patient....
time used:  9.019 seconds

OpenCL Device # 2 = Intel(R) Core(TM)2 Duo CPU     T9400  @ 2.53GHz
Device 2 is an: CPU with max. 2530 MHz and 2 units/cores 
Now computing - please be patient....
time used: 14.623 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :)

plugged:

Code:

Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 9600M GT
Device 0 is an: GPU with max. 1250 MHz and 32 units/cores 
Now computing - please be patient....
time used:  2.788 seconds

OpenCL Device # 1 = GeForce 9400M
Device 1 is an: GPU with max. 1100 MHz and 16 units/cores 
Now computing - please be patient....
time used:  9.025 seconds

OpenCL Device # 2 = Intel(R) Core(TM)2 Duo CPU     T9400  @ 2.53GHz
Device 2 is an: CPU with max. 2530 MHz and 2 units/cores 
Now computing - please be patient....
time used: 14.564 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :)

But when using the 9600 it definitely can see/use both GPUs. I just wish it could do the same when using the 9400, or maybe give a preference for it. I usually only use the 9400 as I don't need the extra graphics performance, but I wouldn't mind using the 9600 from time to time for OpenCL tasks, especially if I'm plugged in.

ungraphic · Aug 30, 2009

They better drop some ATI 3870 support for openCL. First apple couldnt get it right shipping the geforce 8800GT for 1,1 Mac Pros, now they cant even get proper support for OpenCL compatible cards. WTF?

How many of you guys are getting SHAFTED by apple with aftermarket products?

John.B · Aug 30, 2009

stevemiller said:
does anyone else find it strange that apple has added support for opencl in their OS but doesn't seem to have utilized it in any of their actual software? the new media encoding tools in quicktime X seem like a prime candidate for leveraging the gpu, but it seems that only h264 decoding is gpu accelerated.

Ummmm, because it's new and not very many people will be able to take advantage of it on day 1?

stevemiller said:
it doesn't feel very encouraging for the adoption of the technology when they don't even use it themselves.

I expect stuff in the Pro apps to begin optionally adopting this in their next version. I'm thinking applications like Compressor would be a good choice for this sort of thing.

FWIW, Wil Shipley wrote an interesting piece on this sort of decision making/prioritizing process recently that's worth the read if you have the time to invest in it. A good window into the (IMO) correct decision making process for software development.

stevemiller said:
full-disclosure: i do media stuff for a living and i'm mostly just chomping at the bit for some real-world evidence that opencl might give my system an added boost!

I do have a couple of your albums. Always loved that B3 part on Fly Like an Eagle. And many, many years ago I played in a band that did a great Space Cowboy->Space Truckin'->Space Cowboy medley.

itsthenewdc · Aug 30, 2009

IEatApples said:
I've also noticed a lot of other less powerful GPU's beating my GTX 285 something must be wrong

Found the reason behind my problem at least. Went to nVidia's site and put in my 285 on the drivers page. They have a separate CUDA driver that I downloaded and installed and now I get the .2xxx results :]

Scottsdale · Aug 30, 2009

9400m GPU performance/clock speed in different Macs...

I noticed something here.

Those with MBPs which have the Nvidia 9400m are showing 1100 MHz and 16 units/cores.

My MacBook Air 2.13 GHz with 9400m shows 800 MHz and 16 units/cores.

The weird thing is, mine is showing exactly same scores in 9.2 second range of time for the GPU to run test.

What exactly does this mean? The MBA rev B did show 4x GPU performance of original MBA. However, this newest model shows 6x GPU performance of original MBA. Is the MBA's GPU being throttled? It does show my CPU running at 2.13 GHz.

I haven't seen what some with the 9400m in an iMac, Mac mini, and MB are showing for both the clock speed and time for GPU to run test.

Interested in sharing data with others and trying to determine what Apple is doing with the same 9400m GPU in the MacBook Air.

ayeying · Aug 30, 2009

Scottsdale said:
I noticed something here.

Those with MBPs which have the Nvidia 9400m are showing 1100 MHz and 16 units/cores.

My MacBook Air 2.13 GHz with 9400m shows 800 MHz and 16 units/cores.

The weird thing is, mine is showing exactly same scores in 9.2 second range of time for the GPU to run test.

What exactly does this mean? The MBA rev B did show 4x GPU performance of original MBA. However, this newest model shows 6x GPU performance of original MBA. Is the MBA's GPU being throttled? It does show my CPU running at 2.13 GHz.

I haven't seen what some with the 9400m in an iMac, Mac mini, and MB are showing for both the clock speed and time for GPU to run test.

Interested in sharing data with others and trying to determine what Apple is doing with the same 9400m GPU in the MacBook Air.

It's proven that its been the GPU is throttled.

The other 9400M video cards should be 550MHz or 1100MHz (as read by the benchmark).

For some reason, we have a 400MHz or 800MHz (as read by the benchmark)

but in reality, we should have a 300MHz or 350MHz (depending on the drivers in Windows) video card in Windows and the others (MB/iMac/Mini, etc) should have 450MHz core speed.

Here's what I got:

Code:

MBA-SLMac-JY:~ Jimmy$ /Users/Jimmy/Downloads/OpenCLBench_as_terminal_tool/OpenCL2_Bench_V025 
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 9400M
Device 0 is an: GPU with max. 800 MHz and 16 units/cores 
Now computing - please be patient....
time used:  9.692 seconds

OpenCL Device # 1 = Intel(R) Core(TM)2 Duo CPU     L9600  @ 2.13GHz
Device 1 is an: CPU with max. 2130 MHz and 2 units/cores 
Now computing - please be patient....
time used: 17.224 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :)

macintoshtoffy · Aug 30, 2009

PinkyMacGodess said:
Is the Intel GMA-based graphics a separate GPU or something that is used by the main processor? If the memory is shared I'd think it's not but don't exactly know. I do know that GMA graphics aren't very fast/capable or recommended for high end graphics like CAD, etc...

Yep. Others have already said no, it's not...

I'd probably say in some cases, even if GMA could support OpenCL, would probably perform worse than the CPU itself.

Erasmus · Aug 30, 2009

macintoshtoffy said:
I'd probably say in some cases, even if GMA could support OpenCL, would probably perform worse than the CPU itself.

Possibly, but it's still extra resources that the computer won't use. Better to have something that barely works than nothing at all.

JFreak · Aug 30, 2009

Is this the June-2007 Santa Rosa model? Seems my old MBP gets the OpenCL love after all

RestlessCaviar said:
Just tried on my lowly late '07 MBP (8600M GT with 256MB)

CLBench_as_terminal_tool/OpenCL2_Bench_V025 ; exit;
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 8600M GT
Device 0 is an: GPU with max. 1040 MHz and 32 units/cores
Now computing - please be patient....
time used: 4.579 seconds

OpenCL Device # 1 = Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
Device 1 is an: CPU with max. 2400 MHz and 2 units/cores
Now computing - please be patient....
time used: 15.328 seconds

Now checking if results are valid - please be patient....
Validate test passed - GPU results=CPU results
logout

With power adapter attached:

/Users/RestlessCaviar/Downloads/OpenCLBench_as_terminal_tool/OpenCL2_Bench_V025 ; exit;
Marcos-Scrivens-MacBook-Pro:~ RestlessCaviar $ /Users/RestlessCaviar/Downloads/OpenCLBench_as_terminal_tool/OpenCL2_Bench_V025 ; exit;
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 8600M GT
Device 0 is an: GPU with max. 1040 MHz and 32 units/cores
Now computing - please be patient....
time used: 2.362 seconds

OpenCL Device # 1 = Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
Device 1 is an: CPU with max. 2400 MHz and 2 units/cores
Now computing - please be patient....
time used: 15.657 seconds

Now checking if results are valid - please be patient....
Validate test passed - GPU results=CPU results

The thing that's disappointing is that this is just a benchmark. I know this is new, but surely Apple had time to work on actually making something useful to showcase this tech?

I'm surprised Apple didn't build this into their latest Quicktime in Snow Leopard, and be able to show off transcoding (exporting) HD movie files to iphone files around 10 times faster! Now *that* would be useful...

JFreak · Aug 30, 2009

fleshman03 said:
Sweet. Now all us 8600 owners have to do is pray that our cards don't burn up....

....or does so while under warranty

JFreak · Aug 30, 2009

flottenheimer said:
The day OpenCL meets Photoshop + the rest of the Adobe Creative Suite will be a very happy day indeed.

Sadly, that day may never come. Adobe currently maintains cross-platform compatibility and does not use Apple/MS proprietary API's to their fullest. They want the two versions to share code as much as possible.

They should just re-write much of their stuff to make the EXISTING features shine. Even if that means zero new features. I'd love it if the CS5 would finally be one worth buying and sticking to for some time...

(loved CS1 until it became obvious PPC would be obsoleted, reluctantly bought into CS3 only to recently find out it's not supported on SL.)

joelypolly · Aug 30, 2009

Erasmus said:
Possibly, but it's still extra resources that the computer won't use. Better to have something that barely works than nothing at all.

If something barely works it is not worth the time and effort to certify it. If you are going to do something badly I would prefer you not to do it at all.

Erasmus · Aug 30, 2009

joelypolly said:
If something barely works it is not worth the time and effort to certify it. If you are going to do something badly I would prefer you not to do it at all.

Obviously by "badly" I mean slowly, not wrong. And considering the 5 fold speed increase (in this crappy benchmark, granted) between the GPU and CPU in a Macbook Pro, I would think that even the GMA950 or whatever it is would bring performance approximately that of the CPU in a Macbook. Which of course would result in dramatic speed increases, once combined with the CPU.

Another thought, this probably won't happen, because it would take much too much work, and won't be valid for new technology, but with a bit of software trickery, it should be possible to make single precision floating point operation units fake double precision by simply doing multiple calculations, and getting the CPU to stitch it all together at the end. I mean, us humans do it all the time with long division and multiplication. Wouldn't work with nonlinear functions though, so no exponentials.

commander.data · Aug 30, 2009

freiheit said:
You need to be talking to Apple, not AMD. The Radeon HD 2000 series had the hardware necessary for this kind of stuff. As did the 3000 series. Apple have decided not to support them for one reason or another.

2002cbr600f4i said:
If you look on the compatibility page on ATI's site, you'll see in the footnotes that the 2600 series does not support double precision floating point operations...

Now if you're Apple, and you're encouraging people to use this technology, are you going to potentially support something that isn't going to give the exactly same results for calculations that you'd get from the CPU? - NO.

For example:
(dp FP vs sp FP)
4.546677E10 != 4.566E10

If you're doing scientific calculations, and you're expecting double precision and the app is giving you back single precision because you ran it on the GPU instead of the CPU, you're going to be pissed.

I asked AMD's Stream Computing team about the possibility of HD2000 and HD3000 series support for OpenCL when the original Snow Leopard specs for OpenCL support came out and they said that OpenCL HD2000 and HD3000 do not and will not support OpenCL due to hardware limitations. They didn't say what the limitation is, but I'm almost certain it has nothing to do with double precision floating point support. For one thing, the nVidia 8000, 9000, and GT100 series don't support DP floats either. Only the ATI HD4000 and nVidia GTX200 series do, which was why I was hoping Apple would go with the HD4670 as the mid-range GPU in the last refresh instead of the 9600M GT and GT100 series. What's more, I'm pretty sure the current OpenCL 1.0 only defines single precision floats and double precision floats are currently an option.

The more likely reason why the HD2000 and HD3000 series don't support OpenCL is that their memory structure is different. nVidia DX10 GPUs define local memory stores in each Streaming Multiprocessor (SM) allowing groups of 8 Stream Processors (SPs) to share data. If I'm not mistaken, the HD2000 and HD3000 series don't have this local memory store between small groups of SP and instead had a data store share by all SPs, which I guess may be more inefficient. OpenCL is reportedly closer to nVidia's CUDA than ATI's CTM so it doesn't surprise my if OpenCL's memory model is closer to nVidia's GPU structure. In any case, ATI seems to agree that this is the way to go, since the HD4000 series has local memory stores between groups of 16 (5-way) SPs. This would explain why the HD4000 series is OpenCL compatible.

As well, the HD4000 series still being slower than nVidia GPUs in OpenCL doesn't surprise me. This has been the case in Folding@home, even when the ATI GPUs are running their own native CTM code. nVidia seems to have spent more effort for designing their GPUs for GPGPU operations since they are increasingly promoting them in competition to CPUs, which nVidia lacks, whereas AMD already makes CPUs so they don't have the same motivation. The 9600M GT being faster than the GT120 is likely, since the GT120 is basically a rebranded 9500GT which is a budget GPU. Apple's $150 price for the GT120 is about 3 times more than the PC version is worth.

On another note, if the 9400M and 9600M GT can both do OpenCL in parallel, which I was hoping for, can the 9600M GT do OpenGL graphics while the 9400M do OpenCL physics? That'd be great for games, since you'd get more realistic physics without sacrificing anything except power and heat since the 9400M would have been doing nothing anyways. This type of parallel GPU usage would be more worthwhile for Apple to focus on since it's more flexible than SLI. Similarly, a Mac Pro could use say a HD4870 for graphics and a second GT120 for physics.

EDIT: I remembered the HD3000 supports double precision floats as well (like the HD4000 and GTX200), but again that doesn't appear to be the reason for the lack of OpenCL support.

lilyyin99 · Aug 30, 2009

Interesting that the CPU beats the 4870 on the Pro!

iAlex · Aug 30, 2009

I'm holding out for Windows 7.................... HAHAHAHA NOT!!!!!!!

netkas · Aug 30, 2009

commander.data said:
I asked AMD's Stream Computing team about the possibility of HD2000 and HD3000 series support for OpenCL when the original Snow Leopard specs for OpenCL support came out and they said that OpenCL HD2000 and HD3000 do not and will not support OpenCL due to hardware limitations. They didn't say what the limitation is, but I'm almost certain it has nothing to do with double precision floating point support. For one thing, the nVidia 8000, 9000, and GT100 series don't support DP floats either. Only the ATI HD4000 and nVidia GTX200 series do, which was why I was hoping Apple would go with the HD4670 as the mid-range GPU in the last refresh instead of the 9600M GT and GT100 series. What's more, I'm pretty sure the current OpenCL 1.0 only defines single precision floats and double precision floats are currently an option.

The more likely reason why the HD2000 and HD3000 series don't support OpenCL is that their memory structure is different. nVidia DX10 GPUs define local memory stores in each Streaming Multiprocessor (SM) allowing groups of 8 Stream Processors (SPs) to share data. If I'm not mistaken, the HD2000 and HD3000 series don't have this local memory store between small groups of SP and instead had a data store share by all SPs, which I guess may be more inefficient. OpenCL is reportedly closer to nVidia's CUDA than ATI's CTM so it doesn't surprise my if OpenCL's memory model is closer to nVidia's GPU structure. In any case, ATI seems to agree that this is the way to go, since the HD4000 series has local memory stores between groups of 16 (5-way) SPs. This would explain why the HD4000 series is OpenCL compatible.

As well, the HD4000 series still being slower than nVidia GPUs in OpenCL doesn't surprise me. This has been the case in Folding@home, even when the ATI GPUs are running their own native CTM code. nVidia seems to have spent more effort for designing their GPUs for GPGPU operations since they are increasingly promoting them in competition to CPUs, which nVidia lacks, whereas AMD already makes CPUs so they don't have the same motivation. The 9600M GT being faster than the GT120 is likely, since the GT120 is basically a rebranded 9500GT which is a budget GPU. Apple's $150 price for the GT120 is about 3 times more than the PC version is worth.

On another note, if the 9400M and 9600M GT can both do OpenCL in parallel, which I was hoping for, can the 9600M GT do OpenGL graphics while the 9400M do OpenCL physics? That'd be great for games, since you'd get more realistic physics without sacrificing anything except power and heat since the 9400M would have been doing nothing anyways. This type of parallel GPU usage would be more worthwhile for Apple to focus on since it's more flexible than SLI. Similarly, a Mac Pro could use say a HD4870 for graphics and a second GT120 for physics.

EDIT: I remembered the HD3000 supports double precision floats as well (like the HD4000 and GTX200), but again that doesn't appear to be the reason for the lack of OpenCL support.

and the answer is..... compute shaders

http://forums.amd.com/forum/messageview.cfm?FTVAR_FORUMVIEWTMP=Linear&catid=328&threadid=116102

>Pixel Shader code (if not using some special stuff like double precision) runs on all cards. Compute shaders only on the HD4000 series.

Apple could make opencl for older radeons via Pixel shaders and glsl, but they didnt want to

awulf · Aug 30, 2009

Here's the result from mine (using a Nvidia GTX 275):

Code:

...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce GTX 275
Device 0 is an: GPU with max. 1460 MHz and 240 units/cores 
Now computing - please be patient....
time used:  0.333 seconds

OpenCL Device # 1 = Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
Device 1 is an: CPU with max. 2700 MHz and 8 units/cores 
Now computing - please be patient....
time used:  4.297 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :) 
logout

John.B · Aug 30, 2009

Erasmus said:
Obviously by "badly" I mean slowly, not wrong. And considering the 5 fold speed increase (in this crappy benchmark, granted) between the GPU and CPU in a Macbook Pro, I would think that even the GMA950 or whatever it is would bring performance approximately that of the CPU in a Macbook.

No. The GMA950 really is that craptacular.

There is a reason that Apple went to all the trouble to break with Intel and its integrated graphics (if you can call it that) chip and finally transitioned to Nvidia graphics instead.

DAMNiatx · Aug 30, 2009

Shawn Parr said:
That's not true on my machine, I just had to log in and out both times when switching graphics chips.

we are not switching, we use BOTH graphics chips.

Erasmus · Aug 30, 2009

John.B said:
No. The GMA950 really is that craptacular.

There is a reason that Apple went to all the trouble to break with Intel and its integrated graphics (if you can call it that) chip and finally transitioned to Nvidia graphics instead.

Personally, I don't believe that the GMA950 is bad enough to not bring noticeable speed gains over just the CPU using OpenCL. 10% is better than nothing. Unfortunately, as it is not supported, we will never know.

John.B · Aug 31, 2009

Erasmus said:
Personally, I don't believe that the GMA950 is bad enough to not bring noticeable speed gains over just the CPU using OpenCL. 10% is better than nothing. Unfortunately, as it is not supported, we will never know.

10% (assuming even that) is not better than nothing. I would rather see them devote effort to supporting newer graphics cards that will provide a more substantial boost than 2% or 5% or 10%. It's even possible the overhead could make the entire operation slower.

The Intel shared memory on-board graphics were once at least competitive, but they got lazy and the graphics chip industry passed them by. Even the newer GMA X3100 (which I have in my Santa Rosa blackbook) would be woefully inadequate for this type of operation, and that's nobody's fault but Intel's.

JFreak · Aug 31, 2009

Erasmus said:
Personally, I don't believe that the GMA950 is bad enough to not bring noticeable speed gains over just the CPU using OpenCL. 10% is better than nothing. Unfortunately, as it is not supported, we will never know.

I believe total performance would likely go DOWN if the GMA950 was used. Remember that it also takes time to arrange things for processing in multiple units so scheduling for that crap can easily cost more than the payload --> what's the point? Plus, if the ATI HD2xxx series hardware is not capable for OpenCL, then how on earth would this integrated crap be better?

PurpleLogix · Aug 31, 2009

This is with my FLASHED 4870 1gb in a 2006 MP

Number of OpenCL devices found: 2
OpenCL Device # 0 = Radeon HD 4870
Device 0 is an: GPU with max. 750 MHz and 4 units/cores
Now computing - please be patient....
time used: 4.066 seconds

Drivers need to be updated, only 4 cores!!!!

OpenCL Benchmarks and Support for Both MacBook Pro GPUs

macrumors regular

macrumors regular

macrumors 6502a

macrumors 601

macrumors regular

Suspended

macrumors 601

macrumors 6502a

macrumors 68030

macrumors 68040

macrumors 68040

macrumors 68040

macrumors 6502a

macrumors 68030

macrumors 65816

macrumors newbie

macrumors member

macrumors 65816

macrumors 6502

macrumors 601

macrumors 6502a

macrumors 68030

macrumors 601

macrumors 68040

macrumors member

Our Staff