High performance rendering with 3 x GTX 980 Ti connected to a Classic Mac Pro

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
P { margin-bottom: 0.08in; }

A word of caution before anyone tries this at home - this project is not the same as a retail product and none of the device connections are hot swappable . Meaning , if the devices are not attached properly or otherwise come loose during operation damage could occur to your Mac, the attached devices or both . Also , booster power to the GPU array must be active before the host computer is started up . Try this project at your own risk and enjoy the results when you do.


About a year ago, my flagship annual project involved installing and internally powering Dual GTX 970 Maxwell graphics cards for rendering purposes in a Nehalem Mac Pro . The benchmark results were respectable , especially with Cuda Core optimized applications .

This year's project involves connecting three even higher end Maxwell GPUs to another Nehalem Mac Pro , but they required external mounting for various reasons . The results are nothing less than stellar , easily surpassing last year's benchmarks and reinvigorating the usefulness of these older Macs . This is especially true since the material cost of the entire project does not exceed the cost of an empty Cubix expansion chassis (before any graphics cards are purchased) .



What this project involves is connecting three externally mounted non-EFI GTX 980 Ti video cards via heavily shielded PCIe cables to the host Mac Pro workstation's PCIe interface slots . Each card has a discrete cable and no splitting is used . An external Power Supply Unit with six eight pin VGA power connectors was necessary in order to provide booster power to each of the video card's two eight pin connectors . An open air frame / test bench (without a motherboard or any other component installed) was used to secure and properly position the GPUs . No additional fans were added to cool the GPUs at load, although adding them might prove useful . It doesn't take long for the cards' internal fans to spin up under load . Tests so far were brief - lasting less than 10 minutes in duration each .


Render tests were performed under OS X 10.10.5 Yosemite and 10.11.3 El Capitan . Results were basically identical .


Results :


LuxMark V 2.1 Sala (Open CL) = 10,798 .


Octane "Trench" Render Target PT (CUDA) = 24 seconds .


Blender BMW Blenchmark scene (CUDA) = 29.36 seconds .



System configuration :


Mac Pro 4,1 > 5,1 (factory 2009)


Factory internal 980 W PSU


2.8 GHz Six Core X5660 CPU


16 GB 1333 MHz memory (4 x 4GB)


250 GB SSD (Samsung 850 EVO) or 1 TB HDD (WD Blue)


DVD-RW drive


OS X 10.10.5 or 10.11.3



3 x EVGA GTX 980 Ti FTW video cards (@ $630 each) .


3 x 3M Twin Axial PCIe x16 500mm extension cable (@ $100 each) .


external 1000 W EVGA PSU (@ $160) .


DimasTech Test Bench Frame ($135) .


Total cost of array (not including host computer) is $2,485 , which is less than the cheapest brand new Cubix Chassis without any cards .





GTX 980 Ti cards were installed in the following slots :


Slot 1 (8 lanes electrical)


Slot 2 (16 lanes)


Slot 3 (4 lanes)


Optional GT 120 EFI UI card was occasionally installed in Slot 4 (4 lanes) .


An attempt was made to install a GTX 980 Ti card in each of the four Nehalem Mac Pro's PCIe slots via these cables . This attempt failed as the Mac refused to complete its POST . It simply went into a chronic loop trying to pass POST and did not crash , freeze or restart . There appears to be a firmware lock that limits these Macs to three Maxwell graphics cards , which would also explain why certain external expansion chassis have issues with recognizing additional installed graphics cards . It's not a chassis limitation, per se . It's a host computer limitation .

I have not performed a burn in stress test yet, as I only today found an utility (LuxMark v 3) to recognize multiple GPUs . Valley , Heaven and FurMark cannot stress test an array of cards in Mac OS X . I will provide thermal and stability data as I collect it . It might be necessary to provide additional cooling when the array is under load for an extended period .


Some additional last minute notes : rendering cards appear not to require more than 4 PCIe Rev 2 lanes each (electrical) , so the additional lanes provided by some of the slots are not necessary . This might prove useful with slot splitter cards in future configurations .


Questions are welcome .


I am Creation Machines .


P2224505.JPG
P2224506.JPG
sala score JPEG.jpg
octane trench score JPEG.jpg
blender bmw score JPEG.jpg
system profile GPU JPEG.jpg


 

AidenShaw

macrumors P6
Feb 8, 2003
18,598
4,604
The Peninsula
Doesn't it make you a bit sad that you could do all that inside a Z-series using the factory power supply - and have more RAM and much faster processors?

Meanwhile, I'm awaiting delivery of three systems each with 72 cores/144 threads, 1 TiB of RAM (up to 6 TiB max if I need more) and five Titan-X per system. Each has five 1.6 TB NVMe SSD drives (8 TB of NVMe per system) and 5 TB of SAS drives.

Apple just doesn't care anymore.
 
Last edited:

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
Doesn't it make you a bit sad that you could do all that inside a Z-series using the factory power supply?
I have a HP Z800 in service as my personal Windows 7 gaming machine . It's an awesome rig and I might connect this GPU array to it and run a few tests .

If you need some advice on how to rebuild this series of workstations , shoot me a message . Mine desperately needed it's IOH chipset re-thermal pasted when it first arrived here or it would have died by now the way I push my gear .

But, anyways, commercial technicians are not permitted to load OS X onto a PC , the last time I checked . So , it is a non issue .

Tests so far pushed 800 W peak , System wide, with the Mac Pro (factory internal + auxiliary external PSUs ) .
 

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
Where did you buy the heavily shielded PCIe cables?
Digi-Key .

http://www.digikey.com/product-search/en?mpart=8KC3-0726-0500&v=19 .

You might not need the X16 version . X4 may actually work as well as the X16 for this project . You definitely need the longest cables available as they are a bit too short as it is ...

These extension cables are stiff, hard to work with and you'll think you are breaking them during installation . They will get a little roughed up due to rubbing against the Mac's PCIe slot area case shielding (thin, sharp metal edges .) But so far, they have held up for me .

The shielded cables are used to prevent signal cross-talk and resist EMI . This improves performance .
 

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
That page refers to Z-840 systems with E5-2600 v2 processors. The current models have E5-2600 v3 and better specs.

View attachment 618284

http://www8.hp.com/us/en/workstations/z840.html
Highest end shipping BTO option for GPU rendering for the V3 HP Z840 models would be 2 x Quadro M6000 , for a total of 6144 Cuda Cores . Retail System price is 12 grand minimum .

My GPU array configuration has a total of 8448 Cuda Cores . Array cost alone is below 2500 bucks .

And yes, I know there is a difference between workstation and consumer nVidia cards .

But in the Mac community creatives often use the consumer versions for rendering .

My array just passed the 19 hour mark in its burn in stress test, running Luxmark 3 stress test option . All three GPUs are at load with no issues observed . A nice stable and powerful array .
 
Last edited:

AidenShaw

macrumors P6
Feb 8, 2003
18,598
4,604
The Peninsula
Highest end shipping BTO option for GPU rendering for the V3 HP Z840 models would be 2 x Quadro M6000 , for a total of 6144 Cuda Cores . Retail System price is 12 grand minimum .

My GPU array configuration has a total of 8448 Cuda Cores . Array cost alone is below 2500 bucks .

And yes, I know there is a difference between workstation and consumer nVidia cards .

But in the Mac community creatives often use the consumer versions for rendering .

My array just passed the 19 hour mark in its burn in stress test, running Luxmark 3 stress test option . All three GPUs are at load with no issues observed . A nice stable and powerful array .
:D Retail price? Who pays that? For HP?

We use consumer cards as well (machine learning and AI). Don't need FP64, and ECC isn't needed. GTX980Ti is sweet, although I've ordered three systems with five Titan-X in each.
 

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
Array just completed successfully its 24 hour burn in without incident , using the LuxBall stress test function of Luxmark 3.x . All three cards performed admirably and benchmarked properly immediately after the completion of the stress test (when still hottest) . GPU heatsink thermals were 53 C (Slot 3) , 82 C (slot 2 and center card of array) and 58 C (slot 1) . Decided to go with an active cool over the array , but the small USB fan didn't have sufficient airflow . System power consumption (internal factory PSU plus external GPU PSU) was at 725 W .

24 hour burn in luxmark 3 stress test 3 x 980 Ti Jpeg.jpg
 
Last edited:

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
It appears the three GTX 980 Ti array has achieved three record Mac OS X scores , compared to the best Barefeats test results :

Luxmark v 2.1 Sala Open CL rendering score (higher is better) :

3 x GTX 980 Ti = 10,798
BF's Mac Pro Nehalem with dual GTX 980 = 4,960
BF's Mac Pro Nehalem with dual R9 290x = 5,388
BF's Mac Pro Cylinder with dual D700 = 3,771



Blender BMW picture render (CUDA or Open CL) in seconds to completion :

3 x GTX 980 Ti = 29.36
BF's Mac Pro Nehalem with dual GTX 980 = 45
BF's Mac Pro Nehalem with dual AMD 7950 = 124
BF's Mac Pro Cylinder with dual D700 = 87



Octane "Trench" RenderTarget PT test (CUDA) , seconds to completion :

3 x GTX 980 Ti = 24
BF's Mac Pro Nehalem with dual GTX 980 = 51
BF's Mac Pro Nehalem with 5 x GPU (Cubix) with one GTX 680 , Dual GTX 580 and Dual GTX 770 = 34
 
Last edited:

Earl Urley

macrumors 6502
Nov 10, 2014
441
181
Thought I heard a prolonged scream from the general direction of Hawaii, must be Rob-Art realizing he no longer has the fastest Mac Pro in existence.

Great job on this mod, and thanks for posting here to tell us.. it can be done!
 

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
Thought I heard a prolonged scream from the general direction of Hawaii, must be Rob-Art realizing he no longer has the fastest Mac Pro in existence.

Great job on this mod, and thanks for posting here to tell us.. it can be done!
I have utmost respect for Rob-Art of Barefeats and his tireless support for Mac performance reporting . It's not a competition but an attempt to push our hardware to the very limit and "thinking outside of the box." It's also about careful documentation so results can be replicated by a wider audience . And , to a lesser extent , it's a plea to Apple to improve their act with the Mac workstation product line . Apple is losing a lot of business to HP, Dell , Boxx , etc and Creatives are jumping ship to Windows and Linux . Apple forgot its roots , which folk like me still remember like it was yesterday .
 
Last edited:

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
Pretty sure he's not in Hawaii.
As long as you are here MVC , do you have any comments on the upper limit on the number of Maxwell GPUs that can be installed concurrently with the classic Mac Pros ? I think it's three but I wonder what it would be if I split one of the host PCIe interfaces . In a wishfully thinking moment I wished for ten GPUs System wide , silly me .
 

AidenShaw

macrumors P6
Feb 8, 2003
18,598
4,604
The Peninsula
As long as you are here MVC , do you have any comments on the upper limit on the number of Maxwell GPUs that can be installed concurrently with the classic Mac Pros ? I think it's three but I wonder what it would be if I split one of the host PCIe interfaces . In a wishfully thinking moment I wished for ten GPUs System wide , silly me .
If you need ten GPUs, I have a feeling that you should think about whether your needs are beyond what the Apple ecosystem can support.

If you want advice about Linux or Windows systems that support ten GPUs, just ask. (Although five Titan-X cards per system is as far as I've gone so far.)
 

Machines

macrumors 6502
Original poster
Jan 23, 2015
426
89
Fox River Valley , Illinois
Just Wanted to says, that's awesome and thanks for sharing
Thanks . I tried my best to push the project as far as I could , since one of my local clients expressed a desire for even greater rendering performance . I could no longer work my magic entirely internally within the Mac .

And also over the years , I was puzzled to hear reports that Mac Cubix users were reporting severe limits on getting their GPUs recognized for reasons unknown .

It now appears to be a host computer firmware related issue .
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.