OpenCL - get the nonrendering GPU in Mac Pro?

Discussion in 'Mac Programming' started by ocltypes17, Jul 10, 2017.

  1. ocltypes17, Jul 10, 2017
    Last edited by a moderator: Jul 12, 2017

    ocltypes17 macrumors newbie

    Joined:
    Jul 10, 2017
    #1
    Hello,

    I am trying to get started with developing some OpenCL codes on a Mac Pro to take advantage of its dual GPUs for calculations (not rendering, just purely mathematical computations). I am following the OpenCL programming guide published by Apple, but ran into a problem with the compiler claiming a device error with the FirePro D500 GPU.

    The error returned from the compiler is:
    [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-2)

    From here this seemed to suggest that by selecting the secondary GPU (I guess the one that is not rendering?), that may resolve the problem:
    https://github.com/bulletphysics/bullet3/issues/208

    However, following the code tidbit in Apple's technical note (https://developer.apple.com/library/content/technotes/tn2335/_index.html) Xcode 8.3.3 still fails to compile, claiming that I have conflicting types for "CGLQueryRenderInfo" with the code below:

    Code:
    // look up available GPUs
        cl_uint num = 0;
        clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU, 0, NULL, &num);
        cl_device_id devices[num];
        clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU,num,devices,NULL);
        cl_context ctx = clCreateContext(NULL,num,devices,NULL,NULL,NULL);
      
        // select non-connected GPU
        CGLRendererInfoObj rend;
        GLint nrend = 0;
        GLint nonDisplayGPURendererID = 0x0;
        CGLQueryRendererInfo(0xffffffff, &rend, &nrend);
      
        for(GLint idx=0; idx<nrend; idx++) {
            GLint online = 1;
            CGLDescribeRenderer(rend, idx, kCGLRPOnline, &online);
            if(!online) {
                GLint accelerated = 0;
                CGLDescribeRenderer(rend, idx, kCGLRPAcceleratedCompute, &accelerated);
                if(accelerated) {
                    CGLDescribeRenderer(rend, idx, kCGLRPRendererID,
                                        &nonDisplayGPURendererID);
                    break;
                }
            }
        }
        CGLDestroyRendererInfo(rend);
      
        // Converting a renderer ID to a cl_device_id
        cl_device_id gpu = (cl_device_id)(intptr_t)(nonDisplayGPURendererID&~0xff);
      
        // Obtain a dispatch queue for GPU in system
        dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, gpu);
      
        // Optional: check device with clGetDeviceInfo
        char name[128];
        // cl_device_id gpu = gcl_get_device_id_with_dispatch_queue(queue);
        clGetDeviceInfo(gpu, CL_DEVICE_NAME, 128, name, NULL);
        fprintf(stdout, "Created a dispatch queue using the %s\n", name);
    Can anyone suggest how I should approach this problem? Is the OpenCL driver broken as some have suggested?
     
  2. ocltypes17 thread starter macrumors newbie

    Joined:
    Jul 10, 2017
    #2
    Ok, I did some more digging and it seemed that the code was able to compile without a problem, but at runtime OpenCL would still fail to reach neither the GPU nor my CPU, suggesting a driver-level problem?

    Can anyone suggest a solution to this?
     
  3. teagls macrumors regular

    Joined:
    May 16, 2013
    #3
    Can you post the actual error you are getting. The driver is supplied by Apple so I highly doubt that is the issue unless something is really messed up with your install.
     
  4. ocltypes17, Jul 11, 2017
    Last edited by a moderator: Jul 12, 2017

    ocltypes17 thread starter macrumors newbie

    Joined:
    Jul 10, 2017
    #4
    This is the main C code from Apple's OpenCL Guide:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <OpenCL/opencl.h>
    #include "hello.cl.h"
    #define NUM_VALUES 1024
    
    int main (int argc, const char * argv[]) {
       int i; char name[128];
       dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, NULL);
      
       if (queue == NULL) {
           queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_CPU, NULL);
       }
      
       cl_device_id gpu = gcl_get_device_id_with_dispatch_queue(queue);
       clGetDeviceInfo(gpu, CL_DEVICE_NAME, 128, name, NULL);
       fprintf(stdout, "Created a dispatch queue using the %s\n", name);
       float* test_in = (float*)malloc(sizeof(cl_float) * NUM_VALUES);
       for (i = 0; i < NUM_VALUES; i++) {
           test_in = (cl_float) i;
       }
      
       float* test_out = (float*)malloc(sizeof(cl_float) * NUM_VALUES);
       void* mem_in  = gcl_malloc(sizeof(cl_float) * NUM_VALUES, test_in, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR);
       void* mem_out = gcl_malloc(sizeof(cl_float) * NUM_VALUES, NULL, CL_MEM_WRITE_ONLY);
      
       dispatch_sync(queue, ^{
           size_t wgs;
           gcl_get_kernel_block_workgroup_info(square_kernel, CL_KERNEL_WORK_GROUP_SIZE, sizeof(wgs), &wgs, NULL);
           cl_ndrange range = {1, {0, 0, 0}, {NUM_VALUES, 0, 0}, {wgs, 0, 0}};
           square_kernel(&range,(cl_float*)mem_in, (cl_float*)mem_out);
           gcl_memcpy(test_out, mem_out, sizeof(cl_float) * NUM_VALUES);});
      
       gcl_free(mem_in); gcl_free(mem_out);
       free(test_in); free(test_out);
       dispatch_release(queue);
    }
    

    The kernel code is:
    Code:
    kernel void square(global float* input, global float* output) {
        size_t i = get_global_id(0);
        output = input * input;
    }
    

    This is the runtime result for the sample code in Apple's OpenCL Guide:
    Code:
    Created a dispatch queue using the AMD Radeon HD - FirePro D500 Compute Engine
    [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-2)
    Break on OpenCLErrorBreak to debug.
    OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz) (err:-2)
    Break on OpenCLWarningBreak to debug.
    [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
    CVMS_ERROR_SERVICE_FAILURE: CVMS compiler has crashed or hung managing the service.
    
    Break on OpenCLErrorBreak to debug.
    Program ended with exit code: 0
     
  5. teagls, Jul 12, 2017
    Last edited: Jul 12, 2017

    teagls macrumors regular

    Joined:
    May 16, 2013
    #5
    There are several issues with your code. Especially the kernel. Are you compiling this in Xcode? What version are you using. I see you use dispatch_release on the GCD queue. That is extremely old and no longer used. I fixed the issues and it compiled and ran correctly for me.

    The main issue is that your kernel is wrong. The syntax is not right. Also you can't multiply two pointers together. Just curious are you aware how c pointers work?

    Code:
    //OLD
    kernel void square(global float* input, global float* output) {
    size_t i = get_global_id(0);
    output = input * input;
    }
    
    //NEW
    __kernel void square(__global float* input, __global float* output) {
        int i = get_global_id(0);
        output[ i ] = input [ i ]* input [ i ];
    }
    
    Also your CPU code is not right either. For loading the data you need to use an index with your c pointer.
    
    //OLD
    for (i = 0; i < NUM_VALUES; i++) {
    test_in = (cl_float) i;
    }
    
    //NEW
    for (i = 0; i < NUM_VALUES; i++) {
          test_in[ i ] = (cl_float) i;
    }
    
    Lastly to keep it simple before you get into more complex things with work groups use this for the cl_ndrange
    
    //OLD
    dispatch_sync(queue, ^{
    size_t wgs;
    gcl_get_kernel_block_workgroup_info(square_kernel, CL_KERNEL_WORK_GROUP_SIZE, sizeof(wgs), &wgs, NULL);
    cl_ndrange range = {1, {0, 0, 0}, {NUM_VALUES, 0, 0}, {wgs, 0, 0}};
    square_kernel(&range,(cl_float*)mem_in, (cl_float*)mem_out);
    gcl_memcpy(test_out, mem_out, sizeof(cl_float) * NUM_VALUES);});
    
    gcl_free(mem_in); gcl_free(mem_out);
    free(test_in); free(test_out);
    dispatch_release(queue);
    }
    
    //NEW
    dispatch_sync(queue, ^{
                cl_ndrange range = { 1, {0}, {NUM_VALUES}, {0} };
                square_kernel(&range,(cl_float*)mem_in, (cl_float*)mem_out);
                gcl_memcpy(test_out, mem_out, sizeof(cl_float) * NUM_VALUES);
                for (int i = 0; i < NUM_VALUES; i++) {
                    printf("\n%.2f",test_out [ i ]);
                }
                gcl_free(mem_in); gcl_free(mem_out);
            });
    free(test_in); free(test_out);
     
  6. ocltypes17 thread starter macrumors newbie

    Joined:
    Jul 10, 2017
    #6
    Yes, I am compiling this in Xcode 8.3.3. I believe that it is the latest version? Admittedly I have only gotten started with C very recently. Prior to this it was mostly Perl and Python so I am still not very familiar with C-type pointers.

    I thought that the double underline prefix for kernels was not required in Xcode? Apple did not use those in their guide, which was also kind of old (2013?): https://developer.apple.com/library...ide/ExampleHelloWorld/Example_HelloWorld.html

    There was still an issue that I could not resolve, as gcl_create_dispatch_queue() only called CL_DEVICE_TYPE_GPU, but for some reason the binary kept churning out runtime warnings and errors that said,

    and the same error code -2 from CL_DEVICE_NOT_AVAILBLE and the warning message actually made me wonder if they were actually referring to the same problem, where OpenCL tried to compile the code for the CPU and failed, despite that I asked explicitly only to queue GPU's.

    Is there a newer guide than the one Apple supplied that I can refer to as well?
     
  7. teagls macrumors regular

    Joined:
    May 16, 2013
    #7
    Yeh, unfortunately I think that guide is terribly outdated. I found an example of that guide on GitHub that somebody had put together. It does not compile nor work for me.
    https://github.com/danieljfarrell/Xcode-4-OpenCL-Example

    While this example does not use Xcode per say it could be put into an Xcode project. It's quite easy to run. Just use terminal and cd into the directory and type make to compile. There is a small guide as well.

    https://www.eriksmistad.no/getting-started-with-opencl-and-gpu-computing/
    https://github.com/smistad/OpenCL-Getting-Started

    Also if you are more familiar with python there are python bindings for OpenCL with examples.
    https://mathema.tician.de/software/pyopencl/
    https://github.com/pyopencl/pyopencl/tree/master/examples
     
  8. foobarbaz macrumors 6502

    Joined:
    Nov 29, 2007
    #8
    Careful, that only applies to ARC, not plain C. (And the compiler would complain if that was the case.)
     
  9. rsacker, Sep 12, 2017
    Last edited: Sep 12, 2017

    rsacker macrumors newbie

    rsacker

    Joined:
    Sep 12, 2017
    Location:
    Los Angeles
    #9
    --- Post Merged, Sep 12, 2017 ---
    --- Post Merged, Sep 12, 2017 ---
    The article between teagls and ocltypes17 regarding "getting the nonRendering GPU on the MacPro", there is an underlying question that went unanswered, namely once you get the nonDisplayCLDeviceId, what the heck do you do with it. It just sits there. Logically it should be input to the gcl_create_dispatch_queue command. The other somewhat unrelated question is how to test the nonDisplay device? Mine were replaced by Apple after they heated up and the display went coocoo but that says nothing about its current state.
     
  10. Benjinator macrumors newbie

    Joined:
    Jun 17, 2016
    #10

    This is kind of late but if you're familiar with python you may want to look into pyopencl. I've been using it for a while and works very well.
     

Share This Page