Which AMD/ATI based Macs have OpenCL Image Support?

Discussion in 'Mac Programming' started by holmesf, Feb 25, 2011.

  1. holmesf, Feb 25, 2011
    Last edited by a moderator: Feb 25, 2011

    holmesf macrumors 6502a

    Joined:
    Sep 30, 2001
    #1
    I'm trying to determine which AMD/ATI graphics based macs currently have OpenCL Image support. It will help inform my current development and purchase plans. I have coded up a tile based deferred lighting scheme, but it's rather useless if there is not broad support for OpenCL Images.

    [​IMG]

    To help all you have to do is run the RayTraced Quaternion Julia-Set Example from Apple:

    http://developer.apple.com/library/...ion_Julia-Set_Example/Introduction/Intro.html

    If you have OpenCL Image support you'll see an awesome 3d fractal, if not you'll get something like this:

    I know already that the 4000 series does not have OpenCL Image support. But I am very interested in the 5000 series, which supports OpenCL Images in theory, but may be limited by the current driver.

    I would be grateful if you could list your OS X version, Mac model, GPU model, and whether or not images are supported.
     
  2. mfram macrumors 65816

    Joined:
    Jan 23, 2010
    Location:
    San Diego, CA USA
    #2
    I have a program I wrote that computes Mandelbrots using OpenCL. It doesn't do it terribly efficiently, but it does work on my iMac with an ATI Radeon 4850.

    I wrote a couple compute engines for the Mandelbrot computation. One that uses a single-threaded CPU routine, one that uses Multi-threaded GCD, and one that uses OpenCL. The OpenCL version does work on my iMac, but it's considerably slower than the GCD multi-threaded version. So I don't know if the OS is doing the computation on the CPU or GPU. I could probably write some test programs to figure that out. But it is working. :confused:
     
  3. holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #3
    When you set up OpenCL you create a command queue using clCreateCommandQueue. OpenCL commands such as executing kernels, copying memory, etc are sent to this command queue. The command queue needs to be set up to use the device you want to use for computation (eg CPU or GPU).

    If it is working, and is indeed running on the GPU but is slower there are a number of reasons this could be:
    1. bad workgroup size. Make sure your workgroup size is a multiple of 64 because AMD GPUs execute threads in groups of 64 called "wavefronts". Usually a workgroup size of 64 or 128 is best.
    2. communication time with CPU. If the amount of work to do is small compared to the size of the data then most of the time will be spent transferring data from the GPU to the CPU rather than on the actual work. See what happens if you increase the number of iterations performed in the kernel, which will increase the work/communication ratio.
    3. control flow: GPUs are pretty bad at handling if statements and especially bad at handling loops. Instead of having a loop that iterates 16 times it can be much better to have 16 sequential statements copy pasted if the loop body is simple. This is called manual loop unrolling. Play around with this.
     
  4. mfram macrumors 65816

    Joined:
    Jan 23, 2010
    Location:
    San Diego, CA USA
    #4
    Well, that's the fundamental issue then. The 'escape time' algorithm discussed in Wikipedia is fundamentally an iterative loop-based algorithm. You iterate the computation until the escape criteria is met. And you can't perform the next iteration without the results from the previous iteration. I don't really see a good way around this. That accounts for the GPU performance issue. Perhaps one of the other algorithms would be more suited for GPU computation. The only good news is that each point is computed independently of the others. That leads to parallel implementations.

    In the end, the efficiency doesn't matter to me other than being an interesting result. Making Mandelbrots was ultimately an exercise in trying OpenCL technology and GCD functionality in Snow Leopard for me. But at least it gives me a good answer why OpenCL didn't really work quickly for that algorithm.

    But as for your original question, I can see OpenCL working on my ATI graphics card. Had OpenCL failed in the initialization process, the program would have disabled the OpenCL computation option. I'd have to write test programs to verify how many work blocks the GPU claimed it was capable of computing at once.
     
  5. Mac_Max macrumors 6502

    Joined:
    Mar 8, 2004
    #5
    From Wikipedia:

    Looks like the 4000 series works just fine with OpenCL. My guess is that it's a hack on top of the Stream SDK and not a native ABI for the Radeon 4000 which could impact performance relative to newer cards.
     
  6. holmesf, Feb 26, 2011
    Last edited: Feb 26, 2011

    holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #6
    It's not a fundamental issue. GPUs are much better at this algorithm than CPUs. You just have to be very aware of GPU architecture, because it's a lot less forgiving than CPU architecture. If you post your OpenCL kernel I can tell you if there are any issues that would cause it to run slowly on a GPU.

    Please test your card on the qJulia example. The issue is not whether OpenCL works on AMD/ATI cards, but whether OpenCL images are supported. It's possible to support OpenCL without having OpenCL Images support.
     
  7. jiminaus macrumors 65816

    jiminaus

    Joined:
    Dec 16, 2010
    Location:
    Sydney
    #7
    I don't have an AMD GPU, but I'd thought I'd run the sample on my NVidia GT 120 just to see. It ran with OpenCL image support, but what I saw was nothing like the image you posted. I've attached an image of what I saw.
     

    Attached Files:

  8. holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #8
    Hey, yeah all the Nvidia GPUs that support OpenCL have OpenCL Image support.

    The image I posted was of my project, but the link is to Apple example code that also uses OpenCL Image.
     
  9. firewood macrumors 604

    Joined:
    Jul 29, 2003
    Location:
    Silicon Valley
    #9
    It can be unrolled and parallelized. Do 100 (10x10) different points 100 times in a row, set a flag if any of the 100 computations escapes, and check the 100 flags for each point at the end of the 10,000 computation unrolled parallel pile. Use the min() function if you need to know when an escape happened.

    Very little looping needed. N=100 and M=100 can be as large as will fit the instruction memory.
     
  10. holmesf, Feb 26, 2011
    Last edited: Feb 26, 2011

    holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #10
    Yes, do unroll the loop that computes z = z*z + c. However, on a GPU it's best to have 1 thread compute 1 pixel rather than iterate over blocks of pixels. I had a big long explanation for why, but it's getting the thread off topic.

    The question is: which AMD GPUs have OpenCL Image support under Mac OS X? This question is easily answered by seeing if Apple's qJulia example runs on your machine:
    http://developer.apple.com/library/...ion_Julia-Set_Example/Introduction/Intro.html

    I would really appreciate it if someone with a Radeon 5xxx series card (For example, a Mac Pro with a 5770 or 5870, or any mid to high end iMac) could try this out and let me know if the example runs or not.
     
  11. chrono1081 macrumors 604

    chrono1081

    Joined:
    Jan 26, 2008
    Location:
    Isla Nublar
    #11
    Well I was going to post that the Macbook Air Ultimate does it then I noticed you said ATI/AMD : /

    But I guess I still posted anyway.
     
  12. holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #12
    Your signature says you have a Mac Pro with a Radeon 5870. Could you try it out on that? That would help me a lot.
     
  13. firewood macrumors 604

    Joined:
    Jul 29, 2003
    Location:
    Silicon Valley
    #13
    GPU's are designed to shade a block of pixels, usually one triangles worth, and often multiple pixels in a span without iteration.
     
  14. holmesf, Feb 27, 2011
    Last edited: Feb 27, 2011

    holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #14
    You are correct about this, in fact the hardware will always schedule pixels in blocks when possible in order to take advantage of the SIMD hardware. OpenCL will do the same if you assign 1 thread to 1 pixel. GPUs actually never process pixels using iteration because they have zero-overhead hardware support for thread creation which automatically assigns threads to pixels. I have spent a lot of time studying how GPUs work, and benchmarking them running OpenCL and CUDA, so I know exactly the right approaches to use for most any task. You'll have to trust me on this, although I can give you references to read if you'd like.
     
  15. jiminaus macrumors 65816

    jiminaus

    Joined:
    Dec 16, 2010
    Location:
    Sydney
    #15
    I thought it was a shame you edited your post 4 or 5 posts up. The original post was a very good explanation.
     
  16. holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #16
    Thanks. I really just want to get my original question answered, heh.
     
  17. chrono1081 macrumors 604

    chrono1081

    Joined:
    Jan 26, 2008
    Location:
    Isla Nublar
    #17
    Oh ya haha I forgot about that.

    It won't run. I simply get "Qjulia requires images: Images not supported on this device." :(
     
  18. holmesf, Mar 3, 2011
    Last edited: Mar 3, 2011

    holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #18
    That's too bad -- looks like I won't be buying that GPU ...

    I recently had the chance to test this on the new Feb 2011 Macbook Pro, and Qjulia also will not run. It appears that even though the newer AMD GPUs support OpenCL Image (and support it in the drivers on Windows and Linux) the Mac drivers do not support it. This is really a shame. I hope that Apple gets better drivers for these GPUs soon.

    Thanks for helping answer my question!
     
  19. Dranix macrumors 6502a

    Dranix

    Joined:
    Feb 26, 2011
    Location:
    Gelnhausen, Germany
    #19
    Short Information on OpelCL: I had huge problems with OpenCL in 10.6. Galaxies didn'T work for example on my 5870 - But in 10.7dp ist work like a charm giving me 150GFLOPS on the GPU. My Harpertown does 38 ;)
     
  20. holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #20
    Does qJulia work on it in 10.7?
     
  21. Dranix, Mar 4, 2011
    Last edited: Mar 4, 2011

    Dranix macrumors 6502a

    Dranix

    Joined:
    Feb 26, 2011
    Location:
    Gelnhausen, Germany
    #21
    Not tried yet. I'll give it a try after work.
     
  22. Dranix macrumors 6502a

    Dranix

    Joined:
    Feb 26, 2011
    Location:
    Gelnhausen, Germany
  23. jiminaus macrumors 65816

    jiminaus

    Joined:
    Dec 16, 2010
    Location:
    Sydney
    #23
    About 10x faster then me. :D
     
  24. holmesf thread starter macrumors 6502a

    Joined:
    Sep 30, 2001
    #24
    Hey, that's great news! In fact that's pretty big news! That probably means that the AMD OpenCL driver is vastly better in 10.7!

    Maybe I will buy that Radeon 5870 after all ...
     

Share This Page