Register FAQ / Rules Forum Spy Search Today's Posts Mark Forums Read
Go Back   MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Reply
 
Thread Tools Search this Thread Display Modes
Old Jul 2, 2012, 02:21 PM   #1
DrJohnZoidberg
macrumors member
 
Join Date: Mar 2012
OpenCL on 2012 MBP (HD 4000 and GT 650M)

I've tentatively started moving some of my OpenCL code to OS X, and I'm finding that My 2012 MBP is listing only two OpenCL devices (the CPU and the GT 650M) where I thought there should be three.

The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform?

The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz.

Are these problems with my code (a possibility I accept ), or has Apple not fully implemented the OpenCL features of the HD 4000 and GT 650M in OS X Lion? Is it possible this is rectified in Mountain Lion?

EDIT: I've remove the "Resolved" title-prefix as Mountain Lion hasn't brought the hoped-for OpenCL fixes

Last edited by DrJohnZoidberg; Jul 27, 2012 at 01:46 PM.
DrJohnZoidberg is offline   1 Reply With Quote
Old Jul 3, 2012, 08:57 AM   #2
aarond12
macrumors 6502a
 
aarond12's Avatar
 
Join Date: May 2002
Location: Dallas, TX USA
Ah, life on the bleeding edge of technology. From what I can tell from the OpenCL benchmarks I have around, Apple hasn't gotten the OpenCL calls ironed out for the newest machines. It's probably NOT your code.
__________________
Voted "Most likely to start his own cult" by my high school class.
aarond12 is offline   0 Reply With Quote
Old Jul 3, 2012, 04:58 PM   #3
DrJohnZoidberg
Thread Starter
macrumors member
 
Join Date: Mar 2012
Thanks. That put's my mind at rest. I'll wait for Mountain Lion and see if that brings any improvement.
DrJohnZoidberg is offline   0 Reply With Quote
Old Jul 21, 2012, 02:45 PM   #4
holmesf
macrumors 6502a
 
Join Date: Sep 2001
Quote:
Originally Posted by DrJohnZoidberg View Post
I've tentatively started moving some of my OpenCL code to OS X, and I'm finding that My 2012 MBP is listing only two OpenCL devices (the CPU and the GT 650M) where I thought there should be three.

The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform?

The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz.

Are these problems with my code (a possibility I accept ), or has Apple not fully implemented the OpenCL features of the HD 4000 and GT 650M in OS X Lion? Is it possible this is rectified in Mountain Lion?
This has also been my experience. The HD 4000 does not show up as an OpenCL device. I hope that Apple works to remedy this: without HD4000 support it's even harder for developers to justify the development time to add OpenCL support.
holmesf is offline   0 Reply With Quote
Old Jul 26, 2012, 06:03 AM   #5
chituan
macrumors newbie
 
Join Date: Oct 2008
So I tried and it seems opencl is still broken with the 650m in Mountain Lion Anyone knows when it will be fixed ?
chituan is offline   0 Reply With Quote
Old Jul 27, 2012, 02:05 PM   #6
DrJohnZoidberg
Thread Starter
macrumors member
 
Join Date: Mar 2012
Quote:
Originally Posted by holmesf View Post
This has also been my experience. The HD 4000 does not show up as an OpenCL device. I hope that Apple works to remedy this: without HD4000 support it's even harder for developers to justify the development time to add OpenCL support.
Quote:
Originally Posted by chituan View Post
So I tried and it seems opencl is still broken with the 650m in Mountain Lion Anyone knows when it will be fixed ?
Upgraded to ML but - as you guys have found - the HD 4000 is still not supported as an OpenCL device, and the GT 650M still appears broken.

I assumed supporting the HD 4000 (and by extension the entire Mac lineup) as an OpenCL device would have been a priority for Apple, but apparently not!? Presumably they have their reasons...

The broken GT 650M implementation is just inexcusable. The CPU is now listed as OpenCL 1.2 (whereas I think it was only 1.1 under Lion), but the GT 650M is still listed as only 1.1 (though by Nvidia's spec' it is 1.2). I'd guess they didn't actually update the drivers for the GT 650M with the release of ML.
DrJohnZoidberg is offline   0 Reply With Quote
Old Jul 27, 2012, 02:24 PM   #7
lloyddean
macrumors 6502a
 
Join Date: May 2009
Location: Des Moines, WA
You've got to keep in mind any OpenCL code would be sharing the GPU with Apples display system and Finder. It may not leave enough resource for other Applications.
lloyddean is offline   0 Reply With Quote
Old Jul 28, 2012, 02:37 PM   #8
larkost
macrumors 6502a
 
Join Date: Oct 2007
Quote:
Originally Posted by DrJohnZoidberg View Post
The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz.
What is the Radar number for the bug you filed? Remember: if you don't file a Radar, then it never happened.
larkost is offline   0 Reply With Quote
Old Jul 30, 2012, 11:53 AM   #9
DrJohnZoidberg
Thread Starter
macrumors member
 
Join Date: Mar 2012
Quote:
Originally Posted by larkost View Post
What is the Radar number for the bug you filed? Remember: if you don't file a Radar, then it never happened.
11986609
I had no idea how to file a bug report with Apple, so thank you for the prompt.
DrJohnZoidberg is offline   0 Reply With Quote
Old Aug 3, 2012, 11:38 PM   #10
holmesf
macrumors 6502a
 
Join Date: Sep 2001
Quote:
Originally Posted by DrJohnZoidberg View Post
I've tentatively started moving some of my OpenCL code to OS X, and I'm finding that My 2012 MBP is listing only two OpenCL devices (the CPU and the GT 650M) where I thought there should be three.

The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform?

The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz.

Are these problems with my code (a possibility I accept ), or has Apple not fully implemented the OpenCL features of the HD 4000 and GT 650M in OS X Lion? Is it possible this is rectified in Mountain Lion?

EDIT: I've remove the "Resolved" title-prefix as Mountain Lion hasn't brought the hoped-for OpenCL fixes
I just upgraded to Mountain Lion on my Retina Macbook Pro and I don't encounter the problem you describe with the 650M on either OS. My program output lists the Geforce 650M's max clock frequency as 405MHz, and the max compute units as 2. Why, 2? If you read Nvidia's whitepaper on Kepler, they define something called SMX, or "Streaming Multiprocessor Architecture" which has 192 single precision CUDA cores per SMX (Kepler Whitepaper). That explains why the 650M with 384 CUDA cores shows up as having 2 compute units. The output of CLBenchmark (which was not run on a Mac) agrees with this compute unit count of two (CLBench results). The Intel CPU now shows OpenCL 1.2 support. Still no support at all for the HD 4000, however.

I'm happy to share my code with you if you are worried you have a hardware issue or if you're just wondering why on earth your app is reporting a bad value for the clock speed.

Here is my program output. The first device listed is the CPU, followed by the GPU.

Code:
*****platform (0)******
CL_PLATFORM_PROFILE: FULL_PROFILE
CL_PLATFORM_VERSION: OpenCL 1.2 (Jun 20 2012 14:18:19)
CL_PLATFORM_NAME: Apple
CL_PLATFORM_VENDOR: Apple
CL_PLATFORM_EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

*****device (0)******
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU 
CL_DEVICE_VENDOR_ID: 4294967295
CL_DEVICE_MAX_COMPUTE_UNITS: 8
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024, 1, 1
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 16
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 8
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 4
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 2
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 4
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 2
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2600
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 4294967296
CL_DEVICE_IMAGE_SUPPORT: CL_TRUE
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 8192
CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048
CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
CL_DEVICE_MAX_SAMPLERS: 16
CL_DEVICE_MAX_PARAMETER_SIZE: 4096
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO 
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 6291456
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 64
CL_DEVICE_GLOBAL_MEM_SIZE: 17179869184
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
CL_DEVICE_MAX_CONSTANT_ARGS: 8
CL_DEVICE_LOCAL_MEM_TYPE: CL_GLOBAL
CL_DEVICE_LOCAL_MEM_SIZE: 32768
CL_DEVICE_ERROR_CORRECTION_SUPPORT: CL_FALSE
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1
CL_DEVICE_ENDIAN_LITTLE: CL_TRUE
CL_DEVICE_AVAILABLE: CL_TRUE
CL_DEVICE_COMPILER_AVAILABLE: CL_TRUE
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL CL_DEVICE_EXECUTION_CAPABILITIESCL_EXEC_NATIVE_KERNEL CL_DEVICE_EXECUTION_CAPABILITIES
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_QUEUE_PROPERTIES
CL_DEVICE_PLATFORM: 2147418112
CL_DEVICE_NAME: 'Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz'
CL_DEVICE_VENDOR: 'Intel'
CL_DRIVER_VERSION: '1.1'
CL_DEVICE_PROFILE: 'FULL_PROFILE'
CL_DEVICE_VERSION: 'OpenCL 1.2 '
CL_DEVICE_EXTENSIONS: 'cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats'

*****device (1)******
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU 
CL_DEVICE_VENDOR_ID: 16918016
CL_DEVICE_MAX_COMPUTE_UNITS: 2
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024, 1024, 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
CL_DEVICE_MAX_CLOCK_FREQUENCY: 405
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 268435456
CL_DEVICE_IMAGE_SUPPORT: CL_TRUE
CL_DEVICE_MAX_READ_IMAGE_ARGS: 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 8192
CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048
CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
CL_DEVICE_MAX_SAMPLERS: 32
CL_DEVICE_MAX_PARAMETER_SIZE: 4352
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO 
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_NONE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 0
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 0
CL_DEVICE_GLOBAL_MEM_SIZE: 1073741824
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
CL_DEVICE_MAX_CONSTANT_ARGS: 9
CL_DEVICE_LOCAL_MEM_TYPE: CL_LOCAL
CL_DEVICE_LOCAL_MEM_SIZE: 49152
CL_DEVICE_ERROR_CORRECTION_SUPPORT: CL_FALSE
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1000
CL_DEVICE_ENDIAN_LITTLE: CL_TRUE
CL_DEVICE_AVAILABLE: CL_TRUE
CL_DEVICE_COMPILER_AVAILABLE: CL_TRUE
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL CL_DEVICE_EXECUTION_CAPABILITIES
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_QUEUE_PROPERTIES
CL_DEVICE_PLATFORM: 2147418112
CL_DEVICE_NAME: 'GeForce GT 650M'
CL_DEVICE_VENDOR: 'NVIDIA'
CL_DRIVER_VERSION: 'CLH 1.0'
CL_DEVICE_PROFILE: 'FULL_PROFILE'
CL_DEVICE_VERSION: 'OpenCL 1.1 '
CL_DEVICE_EXTENSIONS: 'cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops '

Last edited by holmesf; Aug 4, 2012 at 12:08 AM.
holmesf is offline   0 Reply With Quote
Old Aug 4, 2012, 06:13 AM   #11
Mr. Retrofire
macrumors 601
 
Mr. Retrofire's Avatar
 
Join Date: Mar 2010
Location: www.emiliana.cl
Quote:
Originally Posted by DrJohnZoidberg View Post
The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform?
Is the dynamic graphics card switching the problem? Try gfxCardStatus!

__________________

“Only the dead have seen the end of the war.”
-- Plato --
Mr. Retrofire is offline   0 Reply With Quote
Old Aug 4, 2012, 06:16 PM   #12
holmesf
macrumors 6502a
 
Join Date: Sep 2001
Quote:
Originally Posted by Mr. Retrofire View Post
Is the dynamic graphics card switching the problem? Try gfxCardStatus!

I tried this app out and it had no effect on which devices are shown for OpenCL. It seems that the issue must be a lack of drivers for the HD 4000.
holmesf is offline   0 Reply With Quote
Old Aug 4, 2012, 06:30 PM   #13
DrJohnZoidberg
Thread Starter
macrumors member
 
Join Date: Mar 2012
Quote:
Originally Posted by Mr. Retrofire View Post
Is the dynamic graphics card switching the problem? Try gfxCardStatus!
The same thought has occurred to me, that this may be an issue with the gfx switching, but with Aperture running (to force the GT 650M on) I still have the same results. And - though I cannot confirm it - the HD 4000 doesn't seem to appear to be a valid OpenCL device on MacBooks that have no desecrate GPU (and therefore no dynamic switching). It looks like the lack of HD 4000 support is a purposeful move on the part of Apple.

Quote:
Originally Posted by holmesf View Post
I just upgraded to Mountain Lion on my Retina Macbook Pro and I don't encounter the problem you describe with the 650M on either OS. My program output lists the Geforce 650M's max clock frequency as 405MHz, and the max compute units as 2. Why, 2? If you read Nvidia's whitepaper on Kepler, they define something called SMX, or "Streaming Multiprocessor Architecture" which has 192 single precision CUDA cores per SMX (Kepler Whitepaper). That explains why the 650M with 384 CUDA cores shows up as having 2 compute units. The output of CLBenchmark (which was not run on a Mac) agrees with this compute unit count of two (CLBench results). The Intel CPU now shows OpenCL 1.2 support. Still no support at all for the HD 4000, however.

I'm happy to share my code with you if you are worried you have a hardware issue or if you're just wondering why on earth your app is reporting a bad value for the clock speed.
Thank you; that explains the "2" compute-units (for some reason I thought there was 48 CUDA cores per SMX - don't ask me why).

Could you try this - barebones - code?
Code:
#include <iostream>
#include <OpenCL/OpenCL.h>

int main(int argc, const char * argv[])
{
    cl_device_id* devices = NULL;
    cl_uint num_of_devices = 0;
    size_t returned_size = 0;
    cl_char device_vendor[1024] = {0};
    cl_char device_name[1024] = {0};
    cl_uint device_max_clock = 0;
    cl_uint device_max_compute = 0;
    
    clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, 0, NULL, &num_of_devices);
    devices = new cl_device_id [num_of_devices];
    clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, num_of_devices, devices, NULL);
    std::cout << "Number of available OpenCL devices: " << num_of_devices << std::endl;
    for (int i = 0; i < num_of_devices; i++)
    {
        clGetDeviceInfo(devices[i], CL_DEVICE_VENDOR, sizeof(device_vendor), device_vendor, &returned_size);
        clGetDeviceInfo(devices[i], CL_DEVICE_NAME, sizeof(device_name), device_name, &returned_size);
        clGetDeviceInfo(devices[i], CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(device_max_clock), &device_max_clock, &returned_size);
        clGetDeviceInfo(devices[i], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(device_max_compute), &device_max_compute, &returned_size);
        std::cout << device_vendor << " " << device_name << std::endl;
        std::cout << "\t- Max Clock Frequency: " << device_max_clock << std::endl;
        std::cout << "\t- Max Compute Units: " << device_max_compute << std::endl;
    }
    return 0;
}
...it's the cut-down example code I've submitted to Apple in my bug report. On my cMBP the output I get is:
Code:
Number of available OpenCL devices: 2
Intel Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
	- Max Clock Frequency: 2600
	- Max Compute Units: 8
NVIDIA GeForce GT 650M
	- Max Clock Frequency: 0
	- Max Compute Units: 2
It's not the code I am using in my program (I'm using the c++ bindings), but it exhibits the same issue. The clock speed is reported correctly for the Intel CPU, and correctly reports the clock-speed on my desktop's GPU (under Linux and Windows); and I'm guessing (hoping?) it'll report the correct speed on your Retina MBP. I fear i've made some blindingly obvious mistake, but I swear my code was working before I started migrating it to OS X!

I would greatly appreciate it if you wanted to share a snippet of your working code so I can work-out if this is a problem with my hardware or with my lacklustre programming.
DrJohnZoidberg is offline   0 Reply With Quote
Old Aug 4, 2012, 06:40 PM   #14
holmesf
macrumors 6502a
 
Join Date: Sep 2001
Quote:
Originally Posted by DrJohnZoidberg View Post
Thank you; that explains the "2" compute-units (for some reason I thought there was 48 CUDA cores per SMX - don't ask me why).
I think it's possible that in some iterations of Fermi that there are 48 CUDA cores per stream multiprocessor. I know that the G80 way back in 2006 was 8 CUDA cores per stream multiprocessor, and then later they increased this number. I was surprised to read in the Kepler white paper that it's now up to 192 single precision CUDA cores per SMX.

Quote:
Originally Posted by DrJohnZoidberg View Post
Could you try this - barebones - code?
Yes, my output was, oddly, the correct output. It looks like your machine might have a serious issue in hardware or software configuration. I'd still be willing to share my code with you if you're interested. I'd be willing to wager that my code will give the wrong output on your machine as well (my code is rather huge, because it comes from a framework I wrote in 2009 which was to aid in general OpenCL development).

Code:
Number of available OpenCL devices: 2
Intel Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
	- Max Clock Frequency: 2600
	- Max Compute Units: 8
NVIDIA GeForce GT 650M
	- Max Clock Frequency: 405
	- Max Compute Units: 2

Last edited by holmesf; Aug 4, 2012 at 06:50 PM.
holmesf is offline   0 Reply With Quote
Old Aug 4, 2012, 06:51 PM   #15
DrJohnZoidberg
Thread Starter
macrumors member
 
Join Date: Mar 2012
Quote:
Originally Posted by holmesf View Post
Yes, my output was, oddly, the correct output. It looks like your machine might have a serious issue in hardware or software configuration. I'd still be willing to share my code with you if you're interested. I'd be willing to wager that my code will give the wrong output on your machine as well.
Thanks, I think that's confirmed it's either my particular cMBP or the 2012 cMBP range that has the issue (rather than the GT 650M). I'll update my bug report with this information. Thank you, you've been an enormous help.
DrJohnZoidberg is offline   0 Reply With Quote
Old Aug 4, 2012, 06:55 PM   #16
holmesf
macrumors 6502a
 
Join Date: Sep 2001
Quote:
Originally Posted by DrJohnZoidberg View Post
Thanks, I think that's confirmed it's either my particular cMBP or the 2012 cMBP range that has the issue (rather than the GT 650M). I'll update my bug report with this information. Thank you, you've been an enormous help.
No problemo, glad I could! (I would describe myself as one of only a small number of OpenCL aficionados out there)

Do keep this thread updated with whatever information you find. I would be very interested to know what category of machines suffer from this problem.
holmesf is offline   0 Reply With Quote
Old Aug 11, 2012, 07:53 PM   #17
holmesf
macrumors 6502a
 
Join Date: Sep 2001
Thought this might be of interest, even though OpenCL says the max clock frequency of the Geforce 650M is 405Mhz, CUDA tells a different story: 900MHz. I have no idea why the two systems report different numbers.

The good news is that it supports compute capability 3, which makes the new Macbook Pro an awesome platform to develop CUDA code on.

Code:
Found 1 CUDA Capable device(s)

Device 0: "GeForce GT 650M"
  CUDA Driver Version / Runtime Version          5.0 / 4.2
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 1024 MBytes (1073414144 bytes)
  ( 2) Multiprocessors x (192) CUDA Cores/MP:    384 CUDA Cores
  GPU Clock rate:                                900 MHz (0.90 GHz)
  Memory Clock rate:                             2508 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
holmesf is offline   0 Reply With Quote
Old Sep 4, 2012, 09:43 AM   #18
DrJohnZoidberg
Thread Starter
macrumors member
 
Join Date: Mar 2012
Quote:
Originally Posted by holmesf View Post
Thought this might be of interest, even though OpenCL says the max clock frequency of the Geforce 650M is 405Mhz, CUDA tells a different story: 900MHz. I have no idea why the two systems report different numbers.
Thanks for the inf'. There's definitely something wrong with the numbers that OpenCL is reporting. I'm wondering if - for some unexplained reason - it is returning a value that represents how overclocked the chip is (I understand that the GT 650M in the rMBP runs faster than normal GT 650Ms).
DrJohnZoidberg is offline   0 Reply With Quote
Old Sep 7, 2012, 09:29 PM   #19
studUS
macrumors newbie
 
Join Date: Dec 2008
I just came across this page today:
http://www.nvidia.com/object/cuda-mac-driver.html

It seems there is a new Nvidia driver tailored exactly for this card and Mountain Lion. They're saying OpenCL has a performance boost of up to 40% so cannot wait to install and test the new driver
__________________
_____
15" Macbook Pro 10,1: 2.3/16/256 | iPhone 5: 16Gb black | iPad 3: 16gb WiFi
studUS is offline   0 Reply With Quote
Old Sep 7, 2012, 10:20 PM   #20
holmesf
macrumors 6502a
 
Join Date: Sep 2001
Quote:
Originally Posted by studUS View Post
I just came across this page today:
http://www.nvidia.com/object/cuda-mac-driver.html

It seems there is a new Nvidia driver tailored exactly for this card and Mountain Lion. They're saying OpenCL has a performance boost of up to 40% so cannot wait to install and test the new driver
Good find!

You may want to stick with the older driver, however. When I ran some of the CUDA examples Nvidia bundles with their developer SDK I got a kernel panic. I did not experience this with the older drivers.
holmesf is offline   0 Reply With Quote
Old Sep 8, 2012, 12:29 AM   #21
studUS
macrumors newbie
 
Join Date: Dec 2008
I did not get a kernel panic running vector addition in OpenCL but the performance improvement is only about 20-25% in this case which is still not bad! It still doesn't report the Intel HD 4000 card but by having frequent driver updates we can hope OpenCL will one day be up there where CUDA is nowadays
__________________
_____
15" Macbook Pro 10,1: 2.3/16/256 | iPhone 5: 16Gb black | iPad 3: 16gb WiFi
studUS is offline   0 Reply With Quote
Old Sep 8, 2012, 09:34 PM   #22
holmesf
macrumors 6502a
 
Join Date: Sep 2001
Quote:
Originally Posted by studUS View Post
I did not get a kernel panic running vector addition in OpenCL but the performance improvement is only about 20-25% in this case which is still not bad! It still doesn't report the Intel HD 4000 card but by having frequent driver updates we can hope OpenCL will one day be up there where CUDA is nowadays
I got a kernel panic when I ran the CUDA memory bandwidth example. Many of the CUDA examples will also no longer run, exiting with an error stating "out of memory."

20-25% improvement is pretty astounding, especially for such a basic operation like vector addition. It makes you wonder what they did ...
holmesf is offline   0 Reply With Quote
Old Sep 8, 2012, 09:40 PM   #23
studUS
macrumors newbie
 
Join Date: Dec 2008
I'm rather thinking the previous driver was so bad actually (since 650 is a pretty new card) and this one is just closer to what it should be?
__________________
_____
15" Macbook Pro 10,1: 2.3/16/256 | iPhone 5: 16Gb black | iPad 3: 16gb WiFi
studUS is offline   0 Reply With Quote
Old Jan 3, 2013, 11:51 AM   #24
studUS
macrumors newbie
 
Join Date: Dec 2008
I know this topic is few months old now, but has anyone heard anything new about OpenCL working with Intel HD 4000 on Mac OSX?
__________________
_____
15" Macbook Pro 10,1: 2.3/16/256 | iPhone 5: 16Gb black | iPad 3: 16gb WiFi
studUS is offline   1 Reply With Quote
Old Feb 28, 2013, 12:11 AM   #25
retroneo
macrumors 6502a
 
Join Date: Apr 2005
Quote:
Originally Posted by studUS View Post
I know this topic is few months old now, but has anyone heard anything new about OpenCL working with Intel HD 4000 on Mac OSX?
Perhaps someone needs to test it again with 10.8.3
retroneo is offline   0 Reply With Quote

Reply
MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Tags
gt 650m, gt650m, max clock frequency, opencl

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Similar Threads
thread Thread Starter Forum Replies Last Post
MBP Mid 2012 Nvidia GT 650M 1024 MB Lag Issue Jussio MacBook Pro 32 Apr 26, 2013 12:03 PM
Gaming performance of HD 4000 vs 650m jvpython MacBook Pro 22 Sep 23, 2012 11:48 PM
2012 15" macbook pro MBP 650m graphic problem glitch tearing bootcamp joecool99 MacBook Pro 14 Sep 16, 2012 01:53 AM
Intel HD 4000 and GT 650M Soccer5se MacBook Pro 2 Jul 25, 2012 07:54 AM
Combining HD 4000 and 650m Melih MacBook Pro 3 Jun 16, 2012 09:02 PM

Forum Jump

All times are GMT -5. The time now is 06:03 PM.

Mac Rumors | Mac | iPhone | iPhone Game Reviews | iPhone Apps

Mobile Version | Fixed | Fluid | Fluid HD
Copyright 2002-2013, MacRumors.com, LLC