Geekbench METAL scores

Discussion in 'Mac Pro' started by AndreeOnline, Apr 3, 2017.

  1. Asgorath macrumors 65816

    Mar 30, 2012
    This is an incorrect assessment of the situation. TFLOPs is a measurement of raw computational power. If the RX 470 has more TFLOPs than the D700s, then it has more raw computational horsepower. However, this raw power is very rarely the bottleneck in benchmarks, for both OpenCL and OpenGL/Metal. Many OpenCL benchmarks have been written for or tuned for the AMD architecture, and thus run extremely inefficiently on the NVIDIA architecture (since they are fundamentally different). Most compute code written/tuned for NVIDIA uses CUDA, as it exposes more of the underlying architecture to the application. There are a few OpenCL examples like Oceanwave and a face recognition benchmark that run much faster on NVIDIA than AMD, but again, that's probably because they were written on NVIDIA and thus have an implicit bias for that architecture.

    As always, it really just boils down to the applications you want to run. If you care about LuxMark, then buy an AMD card. If you care about DaVinci Resolve, then buy an NVIDIA card.
  2. linuxcooldude macrumors 68020

    Mar 1, 2010
    Both are AMD cards other than the noted differences in models. Geekbench supports Metal/OpenCL/CUDA. But looking up stats for Geekbench 4 a staffer does say it only uses one GPU at a time for the compute benchmark. Interesting....

    So should my metal score actually be 107,852 for 2 cards?
  3. H2SO4 macrumors 68040

    Nov 4, 2008
    Ok, cheers. Makes sense.
  4. linuxcooldude macrumors 68020

    Mar 1, 2010
  5. Yahooligan macrumors 6502a


    Aug 7, 2011
    If you go back to what the original discussion was, the complaint was that the RX470 (AMD card) is slower at Metal than the D700 (Also AMD) even though it has more TFLOPS. So, while the AMD vs Nvidia comparison is valid and benchmarks run better on what they've been optimized for, that doesn't explain why a "faster" gaming GPU is slower running the Metal benchmark than a "slower" D700. I still stand by it being gaming vs workstation GPU and what each is optimized for. Workstation/Computational GPUs with more power simply don't do as well gaming and gaming GPUs don't do as well with computations regardless of their TFLOPS.

    Just my $0.02...
  6. Synchro3, Apr 5, 2017
    Last edited: Apr 5, 2017

    Synchro3 macrumors 65816


    Jan 12, 2014
    ? No, in CUDA it scores 139735:

    Well, I could install my GTX 980 Ti as second GPU in the Mac Pro to achieve that score, but it is already in my Kaby Lake-PC. :D

  7. Asgorath macrumors 65816

    Mar 30, 2012
    I was specifically commenting on this:

    "My D700 and my wife's GTX 680 are both in the 3.1-3.5 tflop range, my D700 kills her GTX 680 in computational benchmarks like OpenCL, her GTX 680 wins when doing rendering benchmarks like Valley."

    which is an AMD vs NVIDIA comparison. Also, the OpenCL tests might make good use of 2 GPUs and thus the 2 D700s could beat a single RX 470 if the test isn't limited by raw TFLOPs.

    Edit: My main point is that we see a lot of posts along the lines of "GPU X has more TFLOPs than GPU Y but GPU Y runs application Z faster, what's up?". The simple answer is that most applications are not limited by raw GPU TFLOPs and the limiting factor is something else. As a result, you should always take the raw TFLOPs numbers with a huge grain of salt.
  8. Yahooligan macrumors 6502a


    Aug 7, 2011
    Emphasis added by my, you are incorrect regarding the Geekbench OpenCL benchmarks as well. These only run on a single GPU at a time. One of my D700 GPUs handily beats my wife's GTX 680. Yes, it's AMD vs Nvidia, but you seem to be commenting on things you're not fully understanding, either. Which is fine. And yes, raw TFLOPS is just a number that doesn't translate well into real-world performance expectations.

    My point also stands. Gaming GPUs and computing GPUs are better at different things. My car may have 500HP but my 350HP truck will easily tow more, faster, and for longer periods.
  9. shaunoneil100 macrumors newbie


    May 4, 2017
    Just new to this and I was thinking of taking back the new 1080 Ti for 2 X R9 390x due to my high use in FCPX - however this ego boost and virtual forum whipping stick that geek bench just gave me I might just stick with it.

    All jokes aside - what are the thoughts on future support on the Nvidia side to push for better OpenCL capability as my work depends on it. Y'all think I should keep the 1080ti or switch to CF 390x's

    32gb ram
    msi 1080ti aero oc (can't overclock it in OS X as far as I know - any suggestions?)
    250gb 960 evo m.2

    Attached Files:

  10. AndreeOnline thread starter macrumors 6502


    Aug 15, 2014
    I actually took my 1080 Ti back yesterday. I had 14 days before I couldn't return it anymore and those days were up so...

    The 1080 Ti is great hardware in itself. I found performance in Resolve in CUDA mode to be great. I use Maxwell Render 4 with GPU support and saw nice performance there too, but not in all conditions.

    FCPX playback wasn't problematic per se, but BruceX could take a minute to export. F1 2016 hung every now and then. Luxmark Luxball worked, but the heavier scenes didn't. Geekbench OpenCL didn't work.

    As a test I put my RX 480 in again and tested the F1 2016 benchmark that made the 1080 Ti hang, and it turned out not only didn't the 480 hang, but it also beat the 1080 Ti in performance.

    So.. ups and downs. At the end it came down to simply recognising the fact that the drivers aren't completely up to speed yet. They may, or may not, work better in the future. But I decided not to wait and find out and returned the card.

    I'll try to wait for Vega and see if that will work. I also think the Radeon Pro Duo looks very interesting with 11.5 TFlops for $995. I could even drop two Pro Duo in the Mac Pro for some sweet 23 TFlops. =)
  11. h9826790 macrumors 604


    Apr 3, 2014
    Hong Kong
    No, the RX470 is NOT stronger than 2x D700.

    Also, the driver for D700 is very mature and highly optimised. On the other hard, there is no official support for RX470. You can make it work by kext edit doesn't mean that the driver can release the card's full potential. In fact, MacOS may not even able to use all 32CU.
  12. H2SO4 macrumors 68040

    Nov 4, 2008
    My bad. I find it funny that they would post individual specs for memory but not for throughput, ie. say 6GB VRAM each but not say 3.5 TFLOPS each.

    Dual AMD FirePro


    graphics processors

    Dual AMD FirePro D700 graphics processors with 6GB of GDDR5 VRAM each

    • 2,048 stream processors
    • 384-bit-wide memory bus
    • 264 GB/s memory bandwidth
    • 3.5 teraflops performance
  13. PowerMike G5 macrumors 6502

    Oct 22, 2005
    New York, NY
    Yes, it really does come down to whether your intended usage gets the speed increase from the new hardware.

    I also bought the 1080 Ti the other day to see if it would accelerate my mostly 4K Adobe cc workflow, over the current Titan X Maxwell I am using. All the synthetic benchmarks were indeed showing a roughly 70% increase in CUDA and OpenCL performance, via GeekBench, LuxMark and Unigine.

    But then I tried out some real-world rendering tests, pertinent to my daily workload.

    In Adobe Premiere, I rendered out a DCI 4K ProRes(HQ) 30-sec clip with 4 effects applied: Lumtri Color with 2 LUTs applied, another Lumetri Color with optical mask tracking, a Colortista/Mojo Filter, and NeatVideo noise reduction. The Neatvideo filter itself can be assigned full resources of your hardware, so I applied 11 of my physical CPU cores and 100% of the VRAM and computing from the GPUs to the filter applied to the footage).

    This is where I was a bit surprised with the results:

    Titan X - CUDA - 05:59
    Titan X - OpenCL - 06:04

    1080 Ti - CUDA - 05:53
    1080 Ti - OpenCL - 05:52

    Then I took a 02:15 DCI 4K ProRes (HQ) clip and exported it out in Adobe Media Encoder as a 2K H.264 master at 25mbps.

    Titan X - AME - CUDA - 01:29
    Titan X - AME - OpenCL - 01:29

    1080 Ti - AME - CUDA - 01:30
    1080 Ti - AME - OpenCL - 01:29

    Suffice to say, though the new GPU hardware itself was more powerful, the results were mostly the same as my older GPU in my workflow.
  14. SoyCapitanSoyCapitan macrumors 68040


    Jul 4, 2015
    Did these same tests on this forum more than a year ago. Same AME results on Open CL and CUDA. Then someone informed us that the GPU doesn't encode h.264. Sure enough I turned on software rendering and the result was the same. There's no GPU rendering for some codecs on macOS.

    Then I rebooted into Bootcamp and the AME render result was exactly 4x faster than macOS. On software rendering! On the same machine!
  15. PowerMike G5 macrumors 6502

    Oct 22, 2005
    New York, NY
    Yes, it seems like AME is using the GPU in the same way as Premiere Pro mostly. So if someone is rendering out a standalone master clip to, say H264, the GPU is only handling the scaling, if there is any at all. Otherwise, it is mostly CPU in this case (my 12-core Mac Pro was using all cores in this instance).

    The GPU looks like it will come into play far more in AME, if exporting from a Premiere timeline that hasn't been rendered out. In that case, the GPU will accelerate any effects/scaling/etc. that have been optimized as using the GPU for such.

    So GPU acceleration can still have quite an impact, but it depends on how one works on their machine.
  16. ActionableMango, May 5, 2017
    Last edited: May 5, 2017

    ActionableMango macrumors 604


    Sep 21, 2010
    The big problem with the theory you've picked (gaming vs workstation) is that the D700 is really just an HD7970. They are the same card, they use the same drivers, and they bench the same. Heck, they even have the exact same ID--AMD didn't bother to give the D700 a different one. There have been many discussions about this in the past. The D700 is not a workstation card except in branding.

    ATI did an article where they explained the difference between workstation and gaming cards. Workstation cards are not faster than (or slower than) their equivalent gaming cards. The exception is where there are highly optimized workstation-GPU-only drivers. But these are on Windows only, not OS X, and wouldn't help a D700 there anyway because the card reports itself identically to an HD7970.

    Your D700 vs GTX680 comparison is apples and oranges. Differences in benchmarks for that particular pair of cards can be explained many different ways, from having different architectures, to using different drivers, and to which brand the software is optimized for. A 7970 will perform just as well as a D700 against a GTX680 in OpenCL and just as poorly in Valley, so "workstation" vs "gaming" is not the explanation.
  17. whartung macrumors newbie

    Dec 29, 2014
    RX460 here. I'm having second thoughts with this card - I don't think its as OOB as one might think - it seems to crash some programs after running for a while.

    Attached Files:

  18. linuxcooldude macrumors 68020

    Mar 1, 2010
    Again, AMD/Nvidia base a lot of their Quattro/Firepro cards on other Radeon/GTX offerings. Besides differences in drivers/support and perhaps slight chip differences. But I would not call them a workstation card just because of that distinction alone.
  19. mmomega macrumors demi-god


    Dec 30, 2009
    DFW, TX
  20. AidenShaw macrumors P6


    Feb 8, 2003
    The Peninsula
    So CUDA is more than three times faster than Metal.

    SAD! ^H^H^H^H^H That's rather disappointing.
  21. mmomega macrumors demi-god


    Dec 30, 2009
    DFW, TX
    This was my thought. This is on the same machine. Run One test, turn around run the 2nd.
  22. linuxcooldude macrumors 68020

    Mar 1, 2010
    No surprise there. Nvidia is not optimized in Metal as much as CUDA. Might change with the new Mac Pro.
  23. koyoot macrumors 601


    Jun 5, 2012
    Ahem. There is M395X from iMac in there. And it still scores higher in OpenCL than GTX 1080 Ti, in Metal ;).

    So it appears its not matter of Metal, but matter of Nvidia rubbish Metal/OpenCL drivers.

    P.S. R9 395X has 3.7 TFLOPs of compute power. GTX 1080 Ti - 11.5 TFLOPs.

    So CUDA performance actually does not reflect the difference in performance that should be apparent between both GPUs.

    But this is MacOS.
  24. flowrider macrumors 601


    Nov 23, 2012
    It does now with the release today of Geekbench 4.1.1. GeekBench 4.1.1 now works with 10.12.6 and Nvidia Web Drivers on the OpenCL test.

  25. Asgorath macrumors 65816

    Mar 30, 2012
    Oh look, it was an application bug that was preventing it from running on NVIDIA:


Share This Page