macOS OpenCL on 2012 MBP (HD 4000 and GT 650M)

DrJohnZoidberg · Jul 2, 2012

I've tentatively started moving some of my OpenCL code to OS X, and I'm finding that My 2012 MBP is listing only two OpenCL devices (the CPU and the GT 650M) where I thought there should be three.

The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform?

The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz.

Are these problems with my code (a possibility I accept 😱), or has Apple not fully implemented the OpenCL features of the HD 4000 and GT 650M in OS X Lion? Is it possible this is rectified in Mountain Lion?

EDIT: I've remove the "Resolved" title-prefix as Mountain Lion hasn't brought the hoped-for OpenCL fixes

aarond12 · Jul 3, 2012

Ah, life on the bleeding edge of technology. From what I can tell from the OpenCL benchmarks I have around, Apple hasn't gotten the OpenCL calls ironed out for the newest machines. It's probably NOT your code.

DrJohnZoidberg · Jul 3, 2012

Thanks. That put's my mind at rest. I'll wait for Mountain Lion and see if that brings any improvement.

holmesf · Jul 21, 2012

DrJohnZoidberg said:
I've tentatively started moving some of my OpenCL code to OS X, and I'm finding that My 2012 MBP is listing only two OpenCL devices (the CPU and the GT 650M) where I thought there should be three.

The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform?

The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz.

Are these problems with my code (a possibility I accept 😱), or has Apple not fully implemented the OpenCL features of the HD 4000 and GT 650M in OS X Lion? Is it possible this is rectified in Mountain Lion?

This has also been my experience. The HD 4000 does not show up as an OpenCL device. I hope that Apple works to remedy this: without HD4000 support it's even harder for developers to justify the development time to add OpenCL support.

chituan · Jul 26, 2012

So I tried and it seems opencl is still broken with the 650m in Mountain Lion 🙁 Anyone knows when it will be fixed ?

DrJohnZoidberg · Jul 27, 2012

holmesf said:
This has also been my experience. The HD 4000 does not show up as an OpenCL device. I hope that Apple works to remedy this: without HD4000 support it's even harder for developers to justify the development time to add OpenCL support.

chituan said:
So I tried and it seems opencl is still broken with the 650m in Mountain Lion 🙁 Anyone knows when it will be fixed ?

Upgraded to ML but - as you guys have found - the HD 4000 is still not supported as an OpenCL device, and the GT 650M still appears broken.

I assumed supporting the HD 4000 (and by extension the entire Mac lineup) as an OpenCL device would have been a priority for Apple, but apparently not!? Presumably they have their reasons...

The broken GT 650M implementation is just inexcusable. The CPU is now listed as OpenCL 1.2 (whereas I think it was only 1.1 under Lion), but the GT 650M is still listed as only 1.1 (though by Nvidia's spec' it is 1.2). I'd guess they didn't actually update the drivers for the GT 650M with the release of ML.

lloyddean · Jul 27, 2012

You've got to keep in mind any OpenCL code would be sharing the GPU with Apples display system and Finder. It may not leave enough resource for other Applications.

larkost · Jul 28, 2012

DrJohnZoidberg said:
The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz.

What is the Radar number for the bug you filed? Remember: if you don't file a Radar, then it never happened.

DrJohnZoidberg · Jul 30, 2012

larkost said:
What is the Radar number for the bug you filed? Remember: if you don't file a Radar, then it never happened.

11986609
I had no idea how to file a bug report with Apple, so thank you for the prompt.

holmesf · Aug 3, 2012

DrJohnZoidberg said:
I've tentatively started moving some of my OpenCL code to OS X, and I'm finding that My 2012 MBP is listing only two OpenCL devices (the CPU and the GT 650M) where I thought there should be three.

The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform?

The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz.

Are these problems with my code (a possibility I accept 😱), or has Apple not fully implemented the OpenCL features of the HD 4000 and GT 650M in OS X Lion? Is it possible this is rectified in Mountain Lion?

EDIT: I've remove the "Resolved" title-prefix as Mountain Lion hasn't brought the hoped-for OpenCL fixes

I just upgraded to Mountain Lion on my Retina Macbook Pro and I don't encounter the problem you describe with the 650M on either OS. My program output lists the Geforce 650M's max clock frequency as 405MHz, and the max compute units as 2. Why, 2? If you read Nvidia's whitepaper on Kepler, they define something called SMX, or "Streaming Multiprocessor Architecture" which has 192 single precision CUDA cores per SMX (Kepler Whitepaper). That explains why the 650M with 384 CUDA cores shows up as having 2 compute units. The output of CLBenchmark (which was not run on a Mac) agrees with this compute unit count of two (CLBench results). The Intel CPU now shows OpenCL 1.2 support. Still no support at all for the HD 4000, however.

I'm happy to share my code with you if you are worried you have a hardware issue or if you're just wondering why on earth your app is reporting a bad value for the clock speed.

Here is my program output. The first device listed is the CPU, followed by the GPU.

Code:

*****platform (0)******
CL_PLATFORM_PROFILE: FULL_PROFILE
CL_PLATFORM_VERSION: OpenCL 1.2 (Jun 20 2012 14:18:19)
CL_PLATFORM_NAME: Apple
CL_PLATFORM_VENDOR: Apple
CL_PLATFORM_EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

*****device (0)******
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU 
CL_DEVICE_VENDOR_ID: 4294967295
CL_DEVICE_MAX_COMPUTE_UNITS: 8
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024, 1, 1
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 16
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 8
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 4
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 2
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 4
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 2
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2600
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 4294967296
CL_DEVICE_IMAGE_SUPPORT: CL_TRUE
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 8192
CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048
CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
CL_DEVICE_MAX_SAMPLERS: 16
CL_DEVICE_MAX_PARAMETER_SIZE: 4096
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO 
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 6291456
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 64
CL_DEVICE_GLOBAL_MEM_SIZE: 17179869184
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
CL_DEVICE_MAX_CONSTANT_ARGS: 8
CL_DEVICE_LOCAL_MEM_TYPE: CL_GLOBAL
CL_DEVICE_LOCAL_MEM_SIZE: 32768
CL_DEVICE_ERROR_CORRECTION_SUPPORT: CL_FALSE
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1
CL_DEVICE_ENDIAN_LITTLE: CL_TRUE
CL_DEVICE_AVAILABLE: CL_TRUE
CL_DEVICE_COMPILER_AVAILABLE: CL_TRUE
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL CL_DEVICE_EXECUTION_CAPABILITIESCL_EXEC_NATIVE_KERNEL CL_DEVICE_EXECUTION_CAPABILITIES
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_QUEUE_PROPERTIES
CL_DEVICE_PLATFORM: 2147418112
CL_DEVICE_NAME: 'Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz'
CL_DEVICE_VENDOR: 'Intel'
CL_DRIVER_VERSION: '1.1'
CL_DEVICE_PROFILE: 'FULL_PROFILE'
CL_DEVICE_VERSION: 'OpenCL 1.2 '
CL_DEVICE_EXTENSIONS: 'cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats'

*****device (1)******
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU 
CL_DEVICE_VENDOR_ID: 16918016
CL_DEVICE_MAX_COMPUTE_UNITS: 2
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024, 1024, 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
CL_DEVICE_MAX_CLOCK_FREQUENCY: 405
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 268435456
CL_DEVICE_IMAGE_SUPPORT: CL_TRUE
CL_DEVICE_MAX_READ_IMAGE_ARGS: 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 8192
CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048
CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
CL_DEVICE_MAX_SAMPLERS: 32
CL_DEVICE_MAX_PARAMETER_SIZE: 4352
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO 
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_NONE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 0
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 0
CL_DEVICE_GLOBAL_MEM_SIZE: 1073741824
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
CL_DEVICE_MAX_CONSTANT_ARGS: 9
CL_DEVICE_LOCAL_MEM_TYPE: CL_LOCAL
CL_DEVICE_LOCAL_MEM_SIZE: 49152
CL_DEVICE_ERROR_CORRECTION_SUPPORT: CL_FALSE
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1000
CL_DEVICE_ENDIAN_LITTLE: CL_TRUE
CL_DEVICE_AVAILABLE: CL_TRUE
CL_DEVICE_COMPILER_AVAILABLE: CL_TRUE
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL CL_DEVICE_EXECUTION_CAPABILITIES
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_QUEUE_PROPERTIES
CL_DEVICE_PLATFORM: 2147418112
CL_DEVICE_NAME: 'GeForce GT 650M'
CL_DEVICE_VENDOR: 'NVIDIA'
CL_DRIVER_VERSION: 'CLH 1.0'
CL_DEVICE_PROFILE: 'FULL_PROFILE'
CL_DEVICE_VERSION: 'OpenCL 1.1 '
CL_DEVICE_EXTENSIONS: 'cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops '

Mr. Retrofire · Aug 4, 2012

DrJohnZoidberg said:
The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform?

Is the dynamic graphics card switching the problem? Try gfxCardStatus!

😱

holmesf · Aug 4, 2012

Mr. Retrofire said:
Is the dynamic graphics card switching the problem? Try gfxCardStatus!

😱

I tried this app out and it had no effect on which devices are shown for OpenCL. It seems that the issue must be a lack of drivers for the HD 4000.

DrJohnZoidberg · Aug 4, 2012

Mr. Retrofire said:
Is the dynamic graphics card switching the problem? Try gfxCardStatus!

The same thought has occurred to me, that this may be an issue with the gfx switching, but with Aperture running (to force the GT 650M on) I still have the same results. And - though I cannot confirm it - the HD 4000 doesn't seem to appear to be a valid OpenCL device on MacBooks that have no desecrate GPU (and therefore no dynamic switching). It looks like the lack of HD 4000 support is a purposeful move on the part of Apple.

holmesf said:
I just upgraded to Mountain Lion on my Retina Macbook Pro and I don't encounter the problem you describe with the 650M on either OS. My program output lists the Geforce 650M's max clock frequency as 405MHz, and the max compute units as 2. Why, 2? If you read Nvidia's whitepaper on Kepler, they define something called SMX, or "Streaming Multiprocessor Architecture" which has 192 single precision CUDA cores per SMX (Kepler Whitepaper). That explains why the 650M with 384 CUDA cores shows up as having 2 compute units. The output of CLBenchmark (which was not run on a Mac) agrees with this compute unit count of two (CLBench results). The Intel CPU now shows OpenCL 1.2 support. Still no support at all for the HD 4000, however.

I'm happy to share my code with you if you are worried you have a hardware issue or if you're just wondering why on earth your app is reporting a bad value for the clock speed.

Thank you; that explains the "2" compute-units (for some reason I thought there was 48 CUDA cores per SMX - don't ask me why).

Could you try this - barebones - code?

Code:

#include <iostream>
#include <OpenCL/OpenCL.h>

int main(int argc, const char * argv[])
{
    cl_device_id* devices = NULL;
    cl_uint num_of_devices = 0;
    size_t returned_size = 0;
    cl_char device_vendor[1024] = {0};
    cl_char device_name[1024] = {0};
    cl_uint device_max_clock = 0;
    cl_uint device_max_compute = 0;
    
    clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, 0, NULL, &num_of_devices);
    devices = new cl_device_id [num_of_devices];
    clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, num_of_devices, devices, NULL);
    std::cout << "Number of available OpenCL devices: " << num_of_devices << std::endl;
    for (int i = 0; i < num_of_devices; i++)
    {
        clGetDeviceInfo(devices[i], CL_DEVICE_VENDOR, sizeof(device_vendor), device_vendor, &returned_size);
        clGetDeviceInfo(devices[i], CL_DEVICE_NAME, sizeof(device_name), device_name, &returned_size);
        clGetDeviceInfo(devices[i], CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(device_max_clock), &device_max_clock, &returned_size);
        clGetDeviceInfo(devices[i], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(device_max_compute), &device_max_compute, &returned_size);
        std::cout << device_vendor << " " << device_name << std::endl;
        std::cout << "\t- Max Clock Frequency: " << device_max_clock << std::endl;
        std::cout << "\t- Max Compute Units: " << device_max_compute << std::endl;
    }
    return 0;
}

...it's the cut-down example code I've submitted to Apple in my bug report. On my cMBP the output I get is:

Code:

Number of available OpenCL devices: 2
Intel Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
	- Max Clock Frequency: 2600
	- Max Compute Units: 8
NVIDIA GeForce GT 650M
	- Max Clock Frequency: 0
	- Max Compute Units: 2

It's not the code I am using in my program (I'm using the c++ bindings), but it exhibits the same issue. The clock speed is reported correctly for the Intel CPU, and correctly reports the clock-speed on my desktop's GPU (under Linux and Windows); and I'm guessing (hoping?) it'll report the correct speed on your Retina MBP. I fear i've made some blindingly obvious mistake, but I swear my code was working before I started migrating it to OS X!

I would greatly appreciate it if you wanted to share a snippet of your working code so I can work-out if this is a problem with my hardware or with my lacklustre programming.

holmesf · Aug 4, 2012

DrJohnZoidberg said:
Thank you; that explains the "2" compute-units (for some reason I thought there was 48 CUDA cores per SMX - don't ask me why).

I think it's possible that in some iterations of Fermi that there are 48 CUDA cores per stream multiprocessor. I know that the G80 way back in 2006 was 8 CUDA cores per stream multiprocessor, and then later they increased this number. I was surprised to read in the Kepler white paper that it's now up to 192 single precision CUDA cores per SMX.

DrJohnZoidberg said:
Could you try this - barebones - code?

Yes, my output was, oddly, the correct output. It looks like your machine might have a serious issue in hardware or software configuration. I'd still be willing to share my code with you if you're interested. I'd be willing to wager that my code will give the wrong output on your machine as well (my code is rather huge, because it comes from a framework I wrote in 2009 which was to aid in general OpenCL development).

Code:

Number of available OpenCL devices: 2
Intel Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
	- Max Clock Frequency: 2600
	- Max Compute Units: 8
NVIDIA GeForce GT 650M
	- Max Clock Frequency: 405
	- Max Compute Units: 2

DrJohnZoidberg · Aug 4, 2012

holmesf said:
Yes, my output was, oddly, the correct output. It looks like your machine might have a serious issue in hardware or software configuration. I'd still be willing to share my code with you if you're interested. I'd be willing to wager that my code will give the wrong output on your machine as well.

Thanks, I think that's confirmed it's either my particular cMBP or the 2012 cMBP range that has the issue (rather than the GT 650M). I'll update my bug report with this information. Thank you, you've been an enormous help.

holmesf · Aug 4, 2012

DrJohnZoidberg said:
Thanks, I think that's confirmed it's either my particular cMBP or the 2012 cMBP range that has the issue (rather than the GT 650M). I'll update my bug report with this information. Thank you, you've been an enormous help.

No problemo, glad I could! (I would describe myself as one of only a small number of OpenCL aficionados out there)

Do keep this thread updated with whatever information you find. I would be very interested to know what category of machines suffer from this problem.

holmesf · Aug 11, 2012

Thought this might be of interest, even though OpenCL says the max clock frequency of the Geforce 650M is 405Mhz, CUDA tells a different story: 900MHz. I have no idea why the two systems report different numbers.

The good news is that it supports compute capability 3, which makes the new Macbook Pro an awesome platform to develop CUDA code on.

Code:

Found 1 CUDA Capable device(s)

Device 0: "GeForce GT 650M"
  CUDA Driver Version / Runtime Version          5.0 / 4.2
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 1024 MBytes (1073414144 bytes)
  ( 2) Multiprocessors x (192) CUDA Cores/MP:    384 CUDA Cores
  GPU Clock rate:                                900 MHz (0.90 GHz)
  Memory Clock rate:                             2508 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:

DrJohnZoidberg · Sep 4, 2012

holmesf said:
Thought this might be of interest, even though OpenCL says the max clock frequency of the Geforce 650M is 405Mhz, CUDA tells a different story: 900MHz. I have no idea why the two systems report different numbers.

Thanks for the inf'. There's definitely something wrong with the numbers that OpenCL is reporting. I'm wondering if - for some unexplained reason - it is returning a value that represents how overclocked the chip is (I understand that the GT 650M in the rMBP runs faster than normal GT 650Ms).

studUS · Sep 7, 2012

I just came across this page today:
http://www.nvidia.com/object/cuda-mac-driver.html

It seems there is a new Nvidia driver tailored exactly for this card and Mountain Lion. They're saying OpenCL has a performance boost of up to 40% so cannot wait to install and test the new driver 🙂

holmesf · Sep 7, 2012

studUS said:
I just came across this page today:
http://www.nvidia.com/object/cuda-mac-driver.html

It seems there is a new Nvidia driver tailored exactly for this card and Mountain Lion. They're saying OpenCL has a performance boost of up to 40% so cannot wait to install and test the new driver 🙂

Good find!

You may want to stick with the older driver, however. When I ran some of the CUDA examples Nvidia bundles with their developer SDK I got a kernel panic. I did not experience this with the older drivers.

studUS · Sep 7, 2012

I did not get a kernel panic running vector addition in OpenCL but the performance improvement is only about 20-25% in this case which is still not bad! It still doesn't report the Intel HD 4000 card but by having frequent driver updates we can hope OpenCL will one day be up there where CUDA is nowadays 🙂

holmesf · Sep 8, 2012

studUS said:
I did not get a kernel panic running vector addition in OpenCL but the performance improvement is only about 20-25% in this case which is still not bad! It still doesn't report the Intel HD 4000 card but by having frequent driver updates we can hope OpenCL will one day be up there where CUDA is nowadays 🙂

I got a kernel panic when I ran the CUDA memory bandwidth example. Many of the CUDA examples will also no longer run, exiting with an error stating "out of memory."

20-25% improvement is pretty astounding, especially for such a basic operation like vector addition. It makes you wonder what they did ...

studUS · Sep 8, 2012

I'm rather thinking the previous driver was so bad actually (since 650 is a pretty new card) and this one is just closer to what it should be? 🙂

studUS · Jan 3, 2013

I know this topic is few months old now, but has anyone heard anything new about OpenCL working with Intel HD 4000 on Mac OSX?

retroneo · Feb 27, 2013

studUS said:
I know this topic is few months old now, but has anyone heard anything new about OpenCL working with Intel HD 4000 on Mac OSX?

Perhaps someone needs to test it again with 10.8.3

macOS OpenCL on 2012 MBP (HD 4000 and GT 650M)

macrumors member

macrumors 65816

macrumors member

macrumors 6502a

macrumors newbie

macrumors member

macrumors 65816

macrumors 6502a

macrumors member

macrumors 6502a

macrumors 603

macrumors 6502a

macrumors member

macrumors 6502a

macrumors member

macrumors 6502a

macrumors 6502a

macrumors member

macrumors newbie

macrumors 6502a

macrumors newbie

macrumors 6502a

macrumors newbie

macrumors newbie

macrumors 6502a

Our Staff