|
|
#1 |
|
OpenCL on 2012 MBP (HD 4000 and GT 650M)
I've tentatively started moving some of my OpenCL code to OS X, and I'm finding that My 2012 MBP is listing only two OpenCL devices (the CPU and the GT 650M) where I thought there should be three.
The Intel HD 4000 is - supposedly - OpenCL capable; why is it absent from the list of OpenCL devices when I query the platform? The information I'm getting from the GT 650M seems a little, well, wrong. It is claiming it has only two compute-units, and a max clock-rate of 0MHz. Are these problems with my code (a possibility I accept ), or has Apple not fully implemented the OpenCL features of the HD 4000 and GT 650M in OS X Lion? Is it possible this is rectified in Mountain Lion?EDIT: I've remove the "Resolved" title-prefix as Mountain Lion hasn't brought the hoped-for OpenCL fixes Last edited by DrJohnZoidberg; Jul 27, 2012 at 01:46 PM. |
|
|
|
1
|
|
|
#2 |
|
Ah, life on the bleeding edge of technology. From what I can tell from the OpenCL benchmarks I have around, Apple hasn't gotten the OpenCL calls ironed out for the newest machines. It's probably NOT your code.
__________________
Voted "Most likely to start his own cult" by my high school class. |
|
|
|
0
|
|
|
#3 |
|
Thanks. That put's my mind at rest. I'll wait for Mountain Lion and see if that brings any improvement.
|
|
|
|
0
|
|
|
#4 | |
|
Quote:
|
||
|
|
0
|
|
|
#5 |
|
So I tried and it seems opencl is still broken with the 650m in Mountain Lion
Anyone knows when it will be fixed ?
|
|
|
|
0
|
|
|
#6 | ||
|
Quote:
Quote:
I assumed supporting the HD 4000 (and by extension the entire Mac lineup) as an OpenCL device would have been a priority for Apple, but apparently not!? Presumably they have their reasons... The broken GT 650M implementation is just inexcusable. The CPU is now listed as OpenCL 1.2 (whereas I think it was only 1.1 under Lion), but the GT 650M is still listed as only 1.1 (though by Nvidia's spec' it is 1.2). I'd guess they didn't actually update the drivers for the GT 650M with the release of ML. |
|||
|
|
0
|
|
|
#7 |
|
You've got to keep in mind any OpenCL code would be sharing the GPU with Apples display system and Finder. It may not leave enough resource for other Applications.
|
|
|
|
0
|
|
|
#8 |
|
What is the Radar number for the bug you filed? Remember: if you don't file a Radar, then it never happened.
|
|
|
|
0
|
|
|
#9 |
|
|
0
|
|
|
#10 | |
|
Quote:
I'm happy to share my code with you if you are worried you have a hardware issue or if you're just wondering why on earth your app is reporting a bad value for the clock speed. Here is my program output. The first device listed is the CPU, followed by the GPU. Code:
*****platform (0)****** CL_PLATFORM_PROFILE: FULL_PROFILE CL_PLATFORM_VERSION: OpenCL 1.2 (Jun 20 2012 14:18:19) CL_PLATFORM_NAME: Apple CL_PLATFORM_VENDOR: Apple CL_PLATFORM_EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event *****device (0)****** CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU CL_DEVICE_VENDOR_ID: 4294967295 CL_DEVICE_MAX_COMPUTE_UNITS: 8 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024, 1, 1 CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024 CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 16 CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 8 CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 4 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 2 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 4 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 2 CL_DEVICE_MAX_CLOCK_FREQUENCY: 2600 CL_DEVICE_ADDRESS_BITS: 64 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 4294967296 CL_DEVICE_IMAGE_SUPPORT: CL_TRUE CL_DEVICE_MAX_READ_IMAGE_ARGS: 128 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8 CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192 CL_DEVICE_IMAGE2D_MAX_HEIGHT: 8192 CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048 CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048 CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048 CL_DEVICE_MAX_SAMPLERS: 16 CL_DEVICE_MAX_PARAMETER_SIZE: 4096 CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024 CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128 CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 6291456 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 64 CL_DEVICE_GLOBAL_MEM_SIZE: 17179869184 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536 CL_DEVICE_MAX_CONSTANT_ARGS: 8 CL_DEVICE_LOCAL_MEM_TYPE: CL_GLOBAL CL_DEVICE_LOCAL_MEM_SIZE: 32768 CL_DEVICE_ERROR_CORRECTION_SUPPORT: CL_FALSE CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1 CL_DEVICE_ENDIAN_LITTLE: CL_TRUE CL_DEVICE_AVAILABLE: CL_TRUE CL_DEVICE_COMPILER_AVAILABLE: CL_TRUE CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL CL_DEVICE_EXECUTION_CAPABILITIESCL_EXEC_NATIVE_KERNEL CL_DEVICE_EXECUTION_CAPABILITIES CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_QUEUE_PROPERTIES CL_DEVICE_PLATFORM: 2147418112 CL_DEVICE_NAME: 'Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz' CL_DEVICE_VENDOR: 'Intel' CL_DRIVER_VERSION: '1.1' CL_DEVICE_PROFILE: 'FULL_PROFILE' CL_DEVICE_VERSION: 'OpenCL 1.2 ' CL_DEVICE_EXTENSIONS: 'cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats' *****device (1)****** CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_VENDOR_ID: 16918016 CL_DEVICE_MAX_COMPUTE_UNITS: 2 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024, 1024, 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024 CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1 CL_DEVICE_MAX_CLOCK_FREQUENCY: 405 CL_DEVICE_ADDRESS_BITS: 32 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 268435456 CL_DEVICE_IMAGE_SUPPORT: CL_TRUE CL_DEVICE_MAX_READ_IMAGE_ARGS: 256 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16 CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192 CL_DEVICE_IMAGE2D_MAX_HEIGHT: 8192 CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048 CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048 CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048 CL_DEVICE_MAX_SAMPLERS: 32 CL_DEVICE_MAX_PARAMETER_SIZE: 4352 CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024 CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128 CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_NONE CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 0 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 0 CL_DEVICE_GLOBAL_MEM_SIZE: 1073741824 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536 CL_DEVICE_MAX_CONSTANT_ARGS: 9 CL_DEVICE_LOCAL_MEM_TYPE: CL_LOCAL CL_DEVICE_LOCAL_MEM_SIZE: 49152 CL_DEVICE_ERROR_CORRECTION_SUPPORT: CL_FALSE CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1000 CL_DEVICE_ENDIAN_LITTLE: CL_TRUE CL_DEVICE_AVAILABLE: CL_TRUE CL_DEVICE_COMPILER_AVAILABLE: CL_TRUE CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL CL_DEVICE_EXECUTION_CAPABILITIES CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_QUEUE_PROPERTIES CL_DEVICE_PLATFORM: 2147418112 CL_DEVICE_NAME: 'GeForce GT 650M' CL_DEVICE_VENDOR: 'NVIDIA' CL_DRIVER_VERSION: 'CLH 1.0' CL_DEVICE_PROFILE: 'FULL_PROFILE' CL_DEVICE_VERSION: 'OpenCL 1.1 ' CL_DEVICE_EXTENSIONS: 'cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops ' Last edited by holmesf; Aug 4, 2012 at 12:08 AM. |
||
|
|
0
|
|
|
#11 | |
|
Quote:
__________________
OS X 10.9 and iOS 7 delayed. Haswell Q3/Q4 2013. -------------------- “Only the dead have seen the end of the war.” -- Plato --
|
||
|
|
0
|
|
|
#12 |
|
|
0
|
|
|
#13 | ||
|
Quote:
Quote:
Could you try this - barebones - code? Code:
#include <iostream>
#include <OpenCL/OpenCL.h>
int main(int argc, const char * argv[])
{
cl_device_id* devices = NULL;
cl_uint num_of_devices = 0;
size_t returned_size = 0;
cl_char device_vendor[1024] = {0};
cl_char device_name[1024] = {0};
cl_uint device_max_clock = 0;
cl_uint device_max_compute = 0;
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, 0, NULL, &num_of_devices);
devices = new cl_device_id [num_of_devices];
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, num_of_devices, devices, NULL);
std::cout << "Number of available OpenCL devices: " << num_of_devices << std::endl;
for (int i = 0; i < num_of_devices; i++)
{
clGetDeviceInfo(devices[i], CL_DEVICE_VENDOR, sizeof(device_vendor), device_vendor, &returned_size);
clGetDeviceInfo(devices[i], CL_DEVICE_NAME, sizeof(device_name), device_name, &returned_size);
clGetDeviceInfo(devices[i], CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(device_max_clock), &device_max_clock, &returned_size);
clGetDeviceInfo(devices[i], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(device_max_compute), &device_max_compute, &returned_size);
std::cout << device_vendor << " " << device_name << std::endl;
std::cout << "\t- Max Clock Frequency: " << device_max_clock << std::endl;
std::cout << "\t- Max Compute Units: " << device_max_compute << std::endl;
}
return 0;
}
Code:
Number of available OpenCL devices: 2 Intel Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz - Max Clock Frequency: 2600 - Max Compute Units: 8 NVIDIA GeForce GT 650M - Max Clock Frequency: 0 - Max Compute Units: 2 I would greatly appreciate it if you wanted to share a snippet of your working code so I can work-out if this is a problem with my hardware or with my lacklustre programming. |
|||
|
|
0
|
|
|
#14 | |
|
Quote:
Yes, my output was, oddly, the correct output. It looks like your machine might have a serious issue in hardware or software configuration. I'd still be willing to share my code with you if you're interested. I'd be willing to wager that my code will give the wrong output on your machine as well (my code is rather huge, because it comes from a framework I wrote in 2009 which was to aid in general OpenCL development). Code:
Number of available OpenCL devices: 2 Intel Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz - Max Clock Frequency: 2600 - Max Compute Units: 8 NVIDIA GeForce GT 650M - Max Clock Frequency: 405 - Max Compute Units: 2 Last edited by holmesf; Aug 4, 2012 at 06:50 PM. |
||
|
|
0
|
|
|
#15 | |
|
Quote:
|
||
|
|
0
|
|
|
#16 | |
|
Quote:
Do keep this thread updated with whatever information you find. I would be very interested to know what category of machines suffer from this problem. |
||
|
|
0
|
|
|
#17 |
|
Thought this might be of interest, even though OpenCL says the max clock frequency of the Geforce 650M is 405Mhz, CUDA tells a different story: 900MHz. I have no idea why the two systems report different numbers.
The good news is that it supports compute capability 3, which makes the new Macbook Pro an awesome platform to develop CUDA code on. Code:
Found 1 CUDA Capable device(s) Device 0: "GeForce GT 650M" CUDA Driver Version / Runtime Version 5.0 / 4.2 CUDA Capability Major/Minor version number: 3.0 Total amount of global memory: 1024 MBytes (1073414144 bytes) ( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores GPU Clock rate: 900 MHz (0.90 GHz) Memory Clock rate: 2508 Mhz Memory Bus Width: 128-bit L2 Cache Size: 262144 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support enabled: No Device is using TCC driver mode: No Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: |
|
|
|
0
|
|
|
#18 |
|
Thanks for the inf'. There's definitely something wrong with the numbers that OpenCL is reporting. I'm wondering if - for some unexplained reason - it is returning a value that represents how overclocked the chip is (I understand that the GT 650M in the rMBP runs faster than normal GT 650Ms).
|
|
|
|
0
|
|
|
#19 |
|
I just came across this page today:
http://www.nvidia.com/object/cuda-mac-driver.html It seems there is a new Nvidia driver tailored exactly for this card and Mountain Lion. They're saying OpenCL has a performance boost of up to 40% so cannot wait to install and test the new driver
__________________
_____ 15" Macbook Pro 10,1: 2.3/16/256 | iPhone 5: 16Gb black | iPad 3: 16gb WiFi |
|
|
|
0
|
|
|
#20 | |
|
Quote:
You may want to stick with the older driver, however. When I ran some of the CUDA examples Nvidia bundles with their developer SDK I got a kernel panic. I did not experience this with the older drivers. |
||
|
|
0
|
|
|
#21 |
|
I did not get a kernel panic running vector addition in OpenCL but the performance improvement is only about 20-25% in this case which is still not bad! It still doesn't report the Intel HD 4000 card but by having frequent driver updates we can hope OpenCL will one day be up there where CUDA is nowadays
__________________
_____ 15" Macbook Pro 10,1: 2.3/16/256 | iPhone 5: 16Gb black | iPad 3: 16gb WiFi |
|
|
|
0
|
|
|
#22 | |
|
Quote:
20-25% improvement is pretty astounding, especially for such a basic operation like vector addition. It makes you wonder what they did ... |
||
|
|
0
|
|
|
#23 |
|
I'm rather thinking the previous driver was so bad actually (since 650 is a pretty new card) and this one is just closer to what it should be?
__________________
_____ 15" Macbook Pro 10,1: 2.3/16/256 | iPhone 5: 16Gb black | iPad 3: 16gb WiFi |
|
|
|
0
|
|
|
#24 |
|
I know this topic is few months old now, but has anyone heard anything new about OpenCL working with Intel HD 4000 on Mac OSX?
__________________
_____ 15" Macbook Pro 10,1: 2.3/16/256 | iPhone 5: 16Gb black | iPad 3: 16gb WiFi |
|
|
|
1
|
|
|
#25 |
|
|
0
|
![]() |
|
| Tags |
| gt 650m, gt650m, max clock frequency, opencl |
«
Previous Thread
|
Next Thread
»
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
All times are GMT -5. The time now is 10:18 AM.




), or has Apple not fully implemented the OpenCL features of the HD 4000 and GT 650M in OS X Lion? Is it possible this is rectified in Mountain Lion?


Anyone knows when it will be fixed ?

Linear Mode
