CUDA performance change after the OSX 10.8.2 update?

Discussion in 'OS X Mountain Lion (10.8)' started by cornelius1, Sep 22, 2012.

  1. macrumors member

    Joined:
    Jul 12, 2008
    #1
    On my retina MacBook Pro with GT 650M (with CUDA driver 5.0.24), I'm seeing 3-4x slower CUDA performance after the OSX 10.8.2 update. For example, with CUDA-Z I'm getting a single-precision float performance of 95 Gflop/s on OSX 10.8.2, compared to the 321 Gflop/s result on OSX 10.8.1 listed here: http://www.barefeats.com/rogue02.html . Also, CUDA tasks in BOINC seem to be 3-4x slower than on OSX 10.8.1. Wondering if it's only me. Is any one else seeing a significant change in CUDA performance after the 10.8.2 update?

    Also, what "single-precision float" result do you get on CUDA-Z (under the Performance tab), and which Nvidia GPU, CUDA driver version, and OSX version are you using?

    CUDA-Z (beta version) can be found here: http://sourceforge.net/projects/cuda-z/files/cuda-z/
    Mac CUDA drivers can be found here: http://www.nvidia.com/object/mac-driver-archive.html
     
  2. macrumors newbie

    Joined:
    Sep 20, 2012
    #2
    I see roughly 92 Gflop/s,

    I'd be interested if anybody who's still on 10.8.1 could run the same test.
     
  3. macrumors member

    Joined:
    Jun 18, 2009
    #3
    I installed the latest version of the CUDA driver and CUDA-Z and single precision performance ranges from 295 to 320GFlops/sec while CUDA-Z is running

    My rMBP is running 10.8.2
     
  4. thread starter macrumors member

    Joined:
    Jul 12, 2008
    #4
    Is that on a retina MBP?
     
  5. macrumors newbie

    Joined:
    Sep 20, 2012
    #5
    Here's my complete result:

    CUDA-Z Report
    =============
    Version: 0.6.156 SVN Built Sep 21 2012 09:52:54
    http://cuda-z.sf.net/
    OS Version: Mac OS X 10.8.2 12C54 (base model, retina Macbook Pro)
    Driver Version: 8.0.61 295.30.20f02
    Driver Dll Version: 5.0
    Runtime Dll Version: 4.20

    Core Information
    ----------------
    Name: GeForce GT 650M
    Compute Capability: 3.0
    Clock Rate: 900 MHz
    Multiprocessors: 2
    Warp Size: 32
    Regs Per Block: 65536
    Threads Per Block: 1024
    Threads Dimensions: 1024 x 1024 x 64
    Grid Dimensions: 2147483647 x 65535 x 65535
    Watchdog Enabled: Yes
    Integrated GPU: No
    Concurrent Kernels: Yes
    Compute Mode: Default

    Memory Information
    ------------------
    Total Global: 1023.69 MiB
    Shared Per Block: 48 KiB
    Pitch: 2048 MiB
    Total Constant: 64 KiB
    Texture Alignment: 512 B
    Texture 1D Size: 65536
    Texture 2D Size: 65536 x 65536
    Texture 3D Size: 4096 x 4096 x 4096
    GPU Overlap: Yes
    Map Host Memory: Yes
    Error Correction: No

    Performance Information
    -----------------------
    Memory Copy
    Host Pinned to Device: 4866.34 MiB/s
    Host Pageable to Device: 4577.5 MiB/s
    Device to Host Pinned: 4822.49 MiB/s
    Device to Host Pageable: 4606.67 MiB/s
    Device to Device: 10.0094 GiB/s
    GPU Core Performance
    Single-precision Float: 95.3403 Gflop/s <<<<<<
    Double-precision Float: 7057.53 Mflop/s
    32-bit Integer: 28.7314 Giop/s
    24-bit Integer: 28.6809 Giop/s

    Generated: Wed Sep 26 02:30:48 2012
     
  6. thread starter macrumors member

    Joined:
    Jul 12, 2008
    #6
    Thanks. Those performance numbers are almost the same as mine. (Base retina MBP model here too).

    That is interesting. I wonder why we get such low scores compared to yours on 10.8.2. Can you post your complete CUDA-Z result, too? I'm just curious about what could be different with your system. ("Export to text" is under the Performance tab). Thanks!
     
  7. thread starter macrumors member

    Joined:
    Jul 12, 2008
    #7
    This is quite weird. CUDA performance seems to be back to normal with OS X 10.8.2 for no apparent reason. CUDA-Z now shows a single precision performance that varies between 256 and 334 Gflop/s (as opposed to the earlier average of 95 Gflop/s).

    However, "Memory Copy" performance seems to be significantly lower than before. I wonder if there is some sort of dynamic resource shifting going on between "Memory Copy" and "GPU Core" operations that results in this kind of trade-off between the two groups of performance values.

    The full report looks like this now:
    Code:
    CUDA-Z Report
    =============
    Version: 0.6.156 SVN Built Sep 21 2012 09:52:54 
    http://cuda-z.sf.net/
    OS Version: Mac OS X 10.8.2 12C54 (base model, retina Macbook Pro)
    Driver Version: 8.0.61 295.30.20f02
    Driver Dll Version: 5.0
    Runtime Dll Version: 4.20
    
    Core Information
    ----------------
    	Name: GeForce GT 650M
    	Compute Capability: 3.0
    	Clock Rate: 900 MHz
    	Multiprocessors: 2
    	Warp Size: 32
    	Regs Per Block: 65536
    	Threads Per Block: 1024
    	Threads Dimensions: 1024 x 1024 x 64
    	Grid Dimensions: 2147483647 x 65535 x 65535
    	Watchdog Enabled: Yes
    	Integrated GPU: No
    	Concurrent Kernels: Yes
    	Compute Mode: Default
    
    Memory Information
    ------------------
    	Total Global: 1023.69 MiB
    	Shared Per Block: 48 KiB
    	Pitch: 2048 MiB
    	Total Constant: 64 KiB
    	Texture Alignment: 512 B
    	Texture 1D Size: 65536
    	Texture 2D Size: 65536 x 65536
    	Texture 3D Size: 4096 x 4096 x 4096
    	GPU Overlap: Yes
    	Map Host Memory: Yes
    	Error Correction: No
    
    Performance Information
    -----------------------
    Memory Copy
    	Host Pinned to Device: 2885.88 MiB/s
    	Host Pageable to Device: 2833.43 MiB/s
    	Device to Host Pinned: 2933.67 MiB/s
    	Device to Host Pageable: 2827.69 MiB/s
    	Device to Device: 8317.46 MiB/s
    GPU Core Performance
    	Single-precision Float: 320.324 Gflop/s <<<<<<<
    	Double-precision Float: 23.0491 Gflop/s
    	32-bit Integer: 86.0745 Giop/s
    	24-bit Integer: 85.8875 Giop/s
    
    Generated: Wed Sep 26 18:53:37 2012
     
  8. macrumors newbie

    Joined:
    Sep 26, 2012
    #8
    I have the same CUDA performance issue here. I hate Apple (seriously I do). I am in the middle of developing a CUDA-based project, now the performance of my program drops down to 1/3 of the performance before 10.8.2 update.

    I made a few tests.
    CUDA-Z is exactly the same as macrons posted.
    I have a bootcamp, so I ran CUDA-Z on Win7 bootcamp too.
    Single precision float is around 100GHz.
    But wired thing is, the clock rate only shows 405MHz instead of 900MHz (which is the clock rate for 650M). So I think CUDA-Z may not support bootcamp very well.

    I made tests with convolutionFFT2D in CUDA SDK Toolkits.
    On Win7 bootcamp, the three test results are:
    142.358473 MPix/s (28.098082 ms)
    158.115622 MPix/s (25.297943 ms)
    193.503693 MPix/s (20.671440 ms)
    On Mountain Lion 10.8.2, the three test results are:
    42.398032 MPix/s (94.344002 ms)
    81.539466 MPix/s (49.056000 ms)
    98.313914 MPix/s (40.686001 ms)

    I made a few tests with convolutionFFT2D, the results are quite stable on Win7 bootcamp. But on Mountain Lion, they are not stable, especially the first test result ranges from 20MPix/s~80MPix/s. But anyways, Win7 is at least twice as fast as Mountain Lion.

    Can anyone make the same test for convolutionFFT2D?

    And does anyone know if reinstall CUDA could solve the performance issue?
    I am going to try reinstalling tonight, and if it does not work, I can only reinstall Mountain Lion. Apple is making my life really hard..
     
  9. macrumors newbie

    Joined:
    Sep 26, 2012
    #9
    Some update:

    Here is my test results of memory bandwidth (using bankdwidthTest in Toolkit)
    You can see Mountain Lion 10.8.2 messed up with the to-device and in-device bandwidth.

    On Mountain Lion 10.8.2:
    Code:
    [bandwidthTest] starting...
    
    ./bandwidthTest Starting...
    
    Running on...
    
     Device 0: GeForce GT 650M
     Quick Mode
    
     Host to Device Bandwidth, 1 Device(s), Paged memory
       Transfer Size (Bytes)	Bandwidth(MB/s)
       33554432			1760.5
    
     Device to Host Bandwidth, 1 Device(s), Paged memory
       Transfer Size (Bytes)	Bandwidth(MB/s)
       33554432			2830.4
    
     Device to Device Bandwidth, 1 Device(s)
       Transfer Size (Bytes)	Bandwidth(MB/s)
       33554432			11348.9
    On Win7 Bootcamp
    Code:
    [bandwidthTest.exe] starting...
    
    bandwidthTest.exe Starting...
    
    Running on...
    
     Device 0: GeForce GT 650M
     Quick Mode
    
     Host to Device Bandwidth, 1 Device(s), Paged memory
       Transfer Size (Bytes)        Bandwidth(MB/s)
       33554432                     3636.6
    
     Device to Host Bandwidth, 1 Device(s), Paged memory
       Transfer Size (Bytes)        Bandwidth(MB/s)
       33554432                     3533.4
    
     Device to Device Bandwidth, 1 Device(s)
       Transfer Size (Bytes)        Bandwidth(MB/s)
       33554432                     19768.2
     
  10. macrumors newbie

    Joined:
    Sep 26, 2012
    #10
    For the record.
    I uninstalled CUDA, then reinstalled CUDA 4.2, still the same.
    I uninstalled CUDA, then installed CUDA 5.0RC, bandwidth gets a bit better (1602/3169/13246); convolutionFFT2D gets worse (third result drops to less than 50 MPix/s).
     
  11. macrumors newbie

    Joined:
    Sep 26, 2012
    #11
    Hi cornelius1, can you give me any hint on how your cuda gets back to work?
    Like you did something, or did nothing just but let your Macbook be idle all the day?
    There are definitely many people experiencing this, e.g., this post:
    http://www.primegrid.com/forum_thread.php?id=4553&nowrap=true

    I just want to get my cuda back to work again, :(
     
  12. thread starter macrumors member

    Joined:
    Jul 12, 2008
    #12
    Hi qianyizh. Sorry, I was the one who started the thread there as well (so I was the only one with this problem in that thread). And I wish I knew how it went back to normal. In fact, today it went back to being 3x as slow once again.
    I still have CUDA driver version 5.0.24, and I have not installed or removed anything that affects the GPU. I didn't change anything in my daily usage pattern either. So, I have no idea what's causing these changes in CUDA performance.
     
  13. cornelius1, Sep 28, 2012
    Last edited: Sep 28, 2012

    thread starter macrumors member

    Joined:
    Jul 12, 2008
    #13
    And it's back to running fast once again. Btw, while a CUDA task is running in the background, CUDA-Z seems to give even better numbers (for both "Memory Copy" and "GPU Core Performance"):
    Code:
    Performance Information
    -----------------------
    Memory Copy
    	Host Pinned to Device: 6030.91 MiB/s <<<<<<<<<
    	Host Pageable to Device: 4129.53 MiB/s
    	Device to Host Pinned: 5859.16 MiB/s
    	Device to Host Pageable: 5175.31 MiB/s
    	Device to Device: 20.0776 GiB/s
    GPU Core Performance
    	Single-precision Float: 400.925 Gflop/s <<<<<<<<<
    	Double-precision Float: 28.766 Gflop/s
    	32-bit Integer: 114.831 Giop/s
    	24-bit Integer: 114.584 Giop/s
    
    Generated: Fri Sep 28 11:51:11 2012
    So, OS X 10.8.2 might be doing some dynamic (frequency?) scaling (based on the GPU load?). I guess that sort of scaling but in the opposite direction (maybe due to high temperatures?) could be causing the huge drop in CUDA performance as well.
     
  14. macrumors newbie

    Joined:
    Sep 26, 2012
    #14
    cornelius1, thanks for sharing your experience.

    Some update from my side:
    My cuda gets back to work just now..
    Last night I did not shut down my OSX, instead, just closed the panel.
    This morning when I woke the Macbook up, cuda is good again.
    (CUDA-Z now has ~300 Gflops/s, and my program gets the normal speed)

    My suspicion is that when you shut down 10.8.2 and then start it up (I have to do this a lot since I need to switch to win7 bootcamp frequently), it slows down cuda. But after a "sleep-wake up" cycle, it can recover.

    Anyone can prove my suspicion?
     
  15. thread starter macrumors member

    Joined:
    Jul 12, 2008
    #15
    I tried shutting down and restarting, and CUDA did not slow down. So, if that happens, it's either not a consistent behavior, or it depends on some other factor (such as the GPU temperature being high).

    Also, when it slowed down yesterday, I had not restarted OS X, I had just closed and reopened the lid (that is, did a sleep-resume cycle). So, restarting OS X cannot be the only trigger (if it is a trigger at all).
     

Share This Page