Which AMD/ATI based Macs have OpenCL Image Support?

holmesf

macrumors 6502a
Sep 30, 2001
524
24
0
Visit site
I'm trying to determine which AMD/ATI graphics based macs currently have OpenCL Image support. It will help inform my current development and purchase plans. I have coded up a tile based deferred lighting scheme, but it's rather useless if there is not broad support for OpenCL Images.



To help all you have to do is run the RayTraced Quaternion Julia-Set Example from Apple:

http://developer.apple.com/library/mac/#samplecode/OpenCL_RayTraced_Quaternion_Julia-Set_Example/Introduction/Intro.html

If you have OpenCL Image support you'll see an awesome 3d fractal, if not you'll get something like this:

Creating Texture 512 x 512...
----------------------------------------------------------------------
Using active OpenGL context...
----------------------------------------------------------------------
Connecting to AMD Radeon HD 4870...
Qjulia requires images: Images not supported on this device.
I know already that the 4000 series does not have OpenCL Image support. But I am very interested in the 5000 series, which supports OpenCL Images in theory, but may be limited by the current driver.

I would be grateful if you could list your OS X version, Mac model, GPU model, and whether or not images are supported.
 
Last edited by a moderator:

mfram

macrumors 65816
Jan 23, 2010
1,054
138
0
San Diego, CA USA
I have a program I wrote that computes Mandelbrots using OpenCL. It doesn't do it terribly efficiently, but it does work on my iMac with an ATI Radeon 4850.

I wrote a couple compute engines for the Mandelbrot computation. One that uses a single-threaded CPU routine, one that uses Multi-threaded GCD, and one that uses OpenCL. The OpenCL version does work on my iMac, but it's considerably slower than the GCD multi-threaded version. So I don't know if the OS is doing the computation on the CPU or GPU. I could probably write some test programs to figure that out. But it is working. :confused:
 

holmesf

macrumors 6502a
Sep 30, 2001
524
24
0
Visit site
I have a program I wrote that computes Mandelbrots using OpenCL. It doesn't do it terribly efficiently, but it does work on my iMac with an ATI Radeon 4850.

I wrote a couple compute engines for the Mandelbrot computation. One that uses a single-threaded CPU routine, one that uses Multi-threaded GCD, and one that uses OpenCL. The OpenCL version does work on my iMac, but it's considerably slower than the GCD multi-threaded version. So I don't know if the OS is doing the computation on the CPU or GPU. I could probably write some test programs to figure that out. But it is working. :confused:
When you set up OpenCL you create a command queue using clCreateCommandQueue. OpenCL commands such as executing kernels, copying memory, etc are sent to this command queue. The command queue needs to be set up to use the device you want to use for computation (eg CPU or GPU).

If it is working, and is indeed running on the GPU but is slower there are a number of reasons this could be:
1. bad workgroup size. Make sure your workgroup size is a multiple of 64 because AMD GPUs execute threads in groups of 64 called "wavefronts". Usually a workgroup size of 64 or 128 is best.
2. communication time with CPU. If the amount of work to do is small compared to the size of the data then most of the time will be spent transferring data from the GPU to the CPU rather than on the actual work. See what happens if you increase the number of iterations performed in the kernel, which will increase the work/communication ratio.
3. control flow: GPUs are pretty bad at handling if statements and especially bad at handling loops. Instead of having a loop that iterates 16 times it can be much better to have 16 sequential statements copy pasted if the loop body is simple. This is called manual loop unrolling. Play around with this.
 

mfram

macrumors 65816
Jan 23, 2010
1,054
138
0
San Diego, CA USA
3. control flow: GPUs are pretty bad at handling if statements and especially bad at handling loops. Instead of having a loop that iterates 16 times it can be much better to have 16 sequential statements copy pasted if the loop body is simple. This is called manual loop unrolling. Play around with this.
Well, that's the fundamental issue then. The 'escape time' algorithm discussed in Wikipedia is fundamentally an iterative loop-based algorithm. You iterate the computation until the escape criteria is met. And you can't perform the next iteration without the results from the previous iteration. I don't really see a good way around this. That accounts for the GPU performance issue. Perhaps one of the other algorithms would be more suited for GPU computation. The only good news is that each point is computed independently of the others. That leads to parallel implementations.

In the end, the efficiency doesn't matter to me other than being an interesting result. Making Mandelbrots was ultimately an exercise in trying OpenCL technology and GCD functionality in Snow Leopard for me. But at least it gives me a good answer why OpenCL didn't really work quickly for that algorithm.

But as for your original question, I can see OpenCL working on my ATI graphics card. Had OpenCL failed in the initialization process, the program would have disabled the OpenCL computation option. I'd have to write test programs to verify how many work blocks the GPU claimed it was capable of computing at once.
 

Mac_Max

macrumors 6502
Mar 8, 2004
404
0
0
From Wikipedia:

On August 28, 2009, Apple released Mac OS X Snow Leopard, which contains a full implementation of OpenCL.[18]
OpenCL in Snow Leopard will initially be supported on the ATI Radeon HD 4850, ATI Radeon HD 4870 and NVIDIA's Geforce 8600M GT, GeForce 8800 GS, GeForce 8800 GT, GeForce 8800 GTS, Geforce 9400M, GeForce 9600M GT, GeForce GT 120, GeForce GT 130, GeForce GTX 285, Quadro FX 4800, and Quadro FX 5600.[19]
On December 21, 2009, AMD released the production version of the ATI Stream SDK 2.0,[27] which provides OpenCL 1.0 support for R800 GPUs and beta support for R700 GPUs.
Looks like the 4000 series works just fine with OpenCL. My guess is that it's a hack on top of the Stream SDK and not a native ABI for the Radeon 4000 which could impact performance relative to newer cards.
 

holmesf

macrumors 6502a
Sep 30, 2001
524
24
0
Visit site
Well, that's the fundamental issue then. The 'escape time' algorithm discussed in Wikipedia is fundamentally an iterative loop-based algorithm. You iterate the computation until the escape criteria is met. And you can't perform the next iteration without the results from the previous iteration. I don't really see a good way around this. That accounts for the GPU performance issue. Perhaps one of the other algorithms would be more suited for GPU computation. The only good news is that each point is computed independently of the others. That leads to parallel implementations.

In the end, the efficiency doesn't matter to me other than being an interesting result. Making Mandelbrots was ultimately an exercise in trying OpenCL technology and GCD functionality in Snow Leopard for me. But at least it gives me a good answer why OpenCL didn't really work quickly for that algorithm.

But as for your original question, I can see OpenCL working on my ATI graphics card. Had OpenCL failed in the initialization process, the program would have disabled the OpenCL computation option. I'd have to write test programs to verify how many work blocks the GPU claimed it was capable of computing at once.
It's not a fundamental issue. GPUs are much better at this algorithm than CPUs. You just have to be very aware of GPU architecture, because it's a lot less forgiving than CPU architecture. If you post your OpenCL kernel I can tell you if there are any issues that would cause it to run slowly on a GPU.

Please test your card on the qJulia example. The issue is not whether OpenCL works on AMD/ATI cards, but whether OpenCL images are supported. It's possible to support OpenCL without having OpenCL Images support.
 
Last edited:

jiminaus

macrumors 65816
Dec 16, 2010
1,448
0
0
Sydney
To help all you have to do is run the RayTraced Quaternion Julia-Set Example from Apple.
I don't have an AMD GPU, but I'd thought I'd run the sample on my NVidia GT 120 just to see. It ran with OpenCL image support, but what I saw was nothing like the image you posted. I've attached an image of what I saw.
 

Attachments

holmesf

macrumors 6502a
Sep 30, 2001
524
24
0
Visit site
I don't have an AMD GPU, but I'd thought I'd run the sample on my NVidia GT 120 just to see. It ran with OpenCL image support, but what I saw was nothing like the image you posted. I've attached an image of what I saw.
Hey, yeah all the Nvidia GPUs that support OpenCL have OpenCL Image support.

The image I posted was of my project, but the link is to Apple example code that also uses OpenCL Image.
 

firewood

macrumors 604
Jul 29, 2003
7,629
864
0
Silicon Valley
Visit site
Well, that's the fundamental issue then. The 'escape time' algorithm discussed in Wikipedia is fundamentally an iterative loop-based algorithm. You iterate the computation until the escape criteria is met.
It can be unrolled and parallelized. Do 100 (10x10) different points 100 times in a row, set a flag if any of the 100 computations escapes, and check the 100 flags for each point at the end of the 10,000 computation unrolled parallel pile. Use the min() function if you need to know when an escape happened.

Very little looping needed. N=100 and M=100 can be as large as will fit the instruction memory.
 

holmesf

macrumors 6502a
Sep 30, 2001
524
24
0
Visit site
It can be unrolled and parallelized. Do 100 (10x10) different points 100 times in a row, set a flag if any of the 100 computations escapes, and check the 100 flags for each point at the end of the 10,000 computation unrolled parallel pile. Use the min() function if you need to know when an escape happened.

Very little looping needed. N=100 and M=100 can be as large as will fit the instruction memory.
Yes, do unroll the loop that computes z = z*z + c. However, on a GPU it's best to have 1 thread compute 1 pixel rather than iterate over blocks of pixels. I had a big long explanation for why, but it's getting the thread off topic.

The question is: which AMD GPUs have OpenCL Image support under Mac OS X? This question is easily answered by seeing if Apple's qJulia example runs on your machine:
http://developer.apple.com/library/mac/#samplecode/OpenCL_RayTraced_Quaternion_Julia-Set_Example/Introduction/Intro.html

I would really appreciate it if someone with a Radeon 5xxx series card (For example, a Mac Pro with a 5770 or 5870, or any mid to high end iMac) could try this out and let me know if the example runs or not.
 
Last edited:

chrono1081

macrumors 604
Jan 26, 2008
7,424
1,379
0
Isla Nublar
Well I was going to post that the Macbook Air Ultimate does it then I noticed you said ATI/AMD : /

But I guess I still posted anyway.
 

holmesf

macrumors 6502a
Sep 30, 2001
524
24
0
Visit site
GPU's are designed to shade a block of pixels, usually one triangles worth, and often multiple pixels in a span without iteration.
You are correct about this, in fact the hardware will always schedule pixels in blocks when possible in order to take advantage of the SIMD hardware. OpenCL will do the same if you assign 1 thread to 1 pixel. GPUs actually never process pixels using iteration because they have zero-overhead hardware support for thread creation which automatically assigns threads to pixels. I have spent a lot of time studying how GPUs work, and benchmarking them running OpenCL and CUDA, so I know exactly the right approaches to use for most any task. You'll have to trust me on this, although I can give you references to read if you'd like.
 
Last edited:

jiminaus

macrumors 65816
Dec 16, 2010
1,448
0
0
Sydney
I thought it was a shame you edited your post 4 or 5 posts up. The original post was a very good explanation.
 

holmesf

macrumors 6502a
Sep 30, 2001
524
24
0
Visit site
Oh ya haha I forgot about that.

It won't run. I simply get "Qjulia requires images: Images not supported on this device." :(
That's too bad -- looks like I won't be buying that GPU ...

I recently had the chance to test this on the new Feb 2011 Macbook Pro, and Qjulia also will not run. It appears that even though the newer AMD GPUs support OpenCL Image (and support it in the drivers on Windows and Linux) the Mac drivers do not support it. This is really a shame. I hope that Apple gets better drivers for these GPUs soon.

Thanks for helping answer my question!
 
Last edited:

Dranix

macrumors 65816
Feb 26, 2011
1,006
410
0
left the forum
Short Information on OpelCL: I had huge problems with OpenCL in 10.6. Galaxies didn'T work for example on my 5870 - But in 10.7dp ist work like a charm giving me 150GFLOPS on the GPU. My Harpertown does 38 ;)
 

holmesf

macrumors 6502a
Sep 30, 2001
524
24
0
Visit site
Short Information on OpelCL: I had huge problems with OpenCL in 10.6. Galaxies didn'T work for example on my 5870 - But in 10.7dp ist work like a charm giving me 150GFLOPS on the GPU. My Harpertown does 38 ;)
Does qJulia work on it in 10.7?