All We Know About Maximizing CPU Related Performance

Tutor · Mar 14, 2014

Although I hadn't paid it much attention until recently, those $400-$500 E5-4650 ES QBEDs that I mentioned in earlier posts here, are revealed in Windows in CPU-Z as E5-2680s [V1], stepping C1 - which is one of the production steppings for the E5-2680 V1. The QBEDs have the very same base frequency and turbo stages and frequencies as the E5-2680 V1s:
1) Frequency base = 2700 MHz;
2)Turbo frequency =
3500 MHz (1 or 2 cores)
3400 MHz (3 cores)
3200 MHz (4, 5 or 6 cores)
3100 MHz (7 or 8 cores) [ http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon E5-2680.html ]. Their Geekbench 3 multicore performance is about 23,000, which places them just above the middle of the space between the latest Ivy Bridge 6 and 8-core Xeons [ http://browser.primatelabs.com/mac-benchmarks ]. So they perform like a fast Ivy Bridge 7-core Xeon (if one did exist).

Tutor · Mar 15, 2014

Old Mac Pros can make respectable homes for up to 3 double wide GPUs (con't).

This post picks up from post 972, above, where I used GTX 680s.
72 seconds for Octane Render Benchmark - 1xGTX780Ti
36 seconds for Octane Render Benchmark - 2xGTX780Ti
24 seconds for Octane Render Benchmark - 3xGTX780Ti
Particulars -
1) Octane Render 1.20;
2) 2007 MacPro2,1 w/ 8 x 3Ghz cores - 4 on each of two CPUs - these lack HT; 32 gigs of 667 MHz ram;
3) Mavericks 10.9.2 (Thanks to Tiamo - https://forums.macrumors.com/threads/1598176/ );
4) To run the 780 Ti I had to upgrade both the GPU and the CUDA drivers - CUDA Driver Version: 5.5.47; Nvidia Web Driver Version: 8.25.9 (331.01.01f02) (Thanks to Lou a/k/a flowrider's notice of update - https://forums.macrumors.com/showthread.php?p=18887540#post18887540 ). Significantly, I converted the pics from Grab’s tiff format into png format with Preview - there was no crash.
5) 3 EVGA GTX 780 Ti SC ACX, each with 3 gig vram ( Maximum Graphics Card Power 250 W - 1 6-pin and 1 8-pin); GPU in slots 1 and 2 powered by FSP Booster + PCIe slots; GPU in slot 4 powered from two mobo 6-pin connectors (one with 6 to 8 converter), with PCIe slots 1, 2 and 4 all set, by Expansion Slot Utility.app in System/Library/CoreServices, to x8; and
6) No HDs in bays 2-4, but two more large capacity HDs in one of these [ http://www.maxupgrades.com/istore/in...&ParentCat=315 ] and FSP BoosterX5 in upper optical bay. Pic on machine's internal would be the same as in post 972, above, except you'd see 3xGTX 780 Ti. Also of note is Mr. Dremel Tool's cut in the frame of the other door for duct taped cable pass.

Regarding linearity in Octane Render, as I pointed out in earlier posts, it takes 2x whatever number of GPU of the exact same type (with all other parameters being the same) to cut the render time in half. If you have one GPU that renders a particular scene in 100 sec., it’ll take another identical one to cut the render time to 50 sec. [1x2]; but if you have two that together render the scene in 50 sec., it’ll take four of those GPUs to cut the 50 sec. render time in half (i.e., to 25 sec.)[2x2], and to cut that 25 sec. in half (i.e., to 12.5 seconds) it’ll take eight of them [4x2], etc. So, using the benchmarks my GPUs achieved, it’s clear that 72 sec. / 2 = 36 sec.. If I wanted to cut that render time to 18 sec., I’d need four EVGA GTX 780 Ti SC ACX participating in the render. However, I can put only three of them into my Mac Pro at anyone time (I don’t have an external chassis).

Tutor · Mar 15, 2014

Old Mac Pros can make respectable homes for up to 3 double wide GPUs (cont).

MIXING IT UP
141.61 seconds for Mike Pan’s BMW scene in Blender w/ MacPro 2007 CPUs only;
73.67 seconds for Mike Pan’s BMW scene in Blender w/ 1xGTX 680;
28.72 seconds for Mike Pan’s BMW scene in Blender w/ 3xGTX 680 (see post # 972, above);
26.14 seconds for Mike Pan’s BMW scene in Blender w/ 2xGTX 780 Ti SC ACX + 1xGTX 680;
25.33 seconds for Mike Pan’s BMW scene in Blender w/ 3xGTX 780 Ti SC ACX;

30 seconds for OctaneRender Benchmark Scene w/ 2xGTX 780 Ti SC ACX + 1xGTX 680

Particulars -
1) Blender 2.69 and Octane Render 1.20;
2) 2007 MacPro2,1 w/ 8 x 3Ghz cores - 4 on each of two CPUs - these lack HT; 32 gigs of 667 MHz ram;
3) Mavericks 10.9.2 (Thanks to Tiamo - https://forums.macrumors.com/showthre...1598176&page=2 );
4) CUDA Driver Version: 5.5.47; Nvidia Web Driver Version: 8.25.9 (331.01.01f02) (Thanks to Lou a/k/a flowrider's notice of update - https://forums.macrumors.com/showthre...0#post18887540 ). Significantly, I converted the pics from Grab’s tiff format into png format with Preview - there was no crash.
5) 3 EVGA GTX 780 Ti SC ACX, each with 3 gig vram ( Maximum Graphics Card Power 250 W - 1 6-pin and 1 8-pin); GPU in slots 1 and 2 powered by FSP Booster + PCIe slots; GPU in slot 4 powered from two mobo 6-pin connectors (one with 6 to 8 converter); and
6) No HDs in bays 2-4, but two more large capacity HDs in one of these [ http://www.maxupgrades.com/istore/in...&ParentCat=315 ] and FSP BoosterX5 in upper optical bay. Also of note is Mr. Dremel Tool's cut in the frame of the outer door for duct taped cable pass of the FSP's PCIe power cables into the GPU/PCIe bay.

Observations, Particularly for OctaneRender Users

2xGTX 780 Ti SC ACX + 1xGTX 680 take 6 sec. more to render a frame from the OctaneRender Benchmark than do 3xGTX 780 Ti SC ACX. When rendering thousands or million or more frames, that 6 sec. per frame can quickly become very noticeable. That’s not to say that the differential won’t grow or shrink depending on frame content. Importantly, Octane supports almost every 3d package worth mentioning. There are forum member provided free exporters for many 3d applications, and, even better (but in somewhat fewer cases), there are application plugins. There are plugins for OctaneRender™ for...
ArchiCAD
Blender
Daz Studio
Lightwave
Poser
Rhino
MODO
3ds Max
AutoCAD
Cinema4D
Inventor
Maya
Revit
Softimage
and plugins for
SketchUp (in development)
Carrara (in development). [ http://render.otoy.com ]

Observations, Particularly for Blender Users

If you’re rendering using Cycles in Blender there’s less than an additional second additional between rendering Mike Pan’s BMW scene using 2xGTX 780 Ti SC ACX + 1xGTX 680 vs. 3xGTX 780 Ti SC ACX. That’s also not to say that in a very large project that the >1 sec. differential won’t quickly add up or that the differential won’t grow or shrink depending on frame content. The same apples to the additional 3.39 sec. that 3xGTX 680s take to render Mike Pan’s BMW scene vs. the lesser time taken by 3xGTX 780 Ti SC ACXs. But if you’re a Blender user and you’re not into rendering animations, but rather just rendering single frame scenes, then 3xGTX 680s may make more economic sense than do 3xGTX 780 Ti SC ACXs. Also, of note is that there’s an Octane Render plugin for Blender (which means that the GTX 780 Ti’s might be preferable in that setting, particularly if you’re doing 3d animation). In any event, one GTX680 will probably cut your render time significantly from what it would be if you’re the owner of a 2007 Mac Pro and used only your CPUs for rendering. In the case of Blender Cycles rendering 3xGTX 780 Ti’s might likely render in 1/5 the time of the CPUs alone.

Tutor · Mar 17, 2014

Kennyman, here they are.

Luxmark v.2.1 one GTX 780 Ti vs. one GTX 590, but in old Mac Pro 2007s [all with CUDA Driver Version: 5.5.47; Nvidia Web Driver Version: 8.25.9 (331.01.01f02)] and those two used for these tests, plus my other one, have been running like champs, with proper sleeping and without any hiccups, continuously for over 60 hours doing all manner of choses.

A. SALA

1) GTX 780 Ti - 1,784

2) GTX 590 - 1,416

B. Room

1) GTX 780 Ti - 891

2) GTX 590 - 739

C. Luxball HDR

1) GTX 780 Ti - 13,057

2) GTX 590 - 10,134

D. Luxball Sky - Render Sunset

1) GTX 780 Ti - 7,329

2) GTX 590 - 6,125

E. Luxball Test

1) GTX 780 Ti - 2,069

2) GTX 590 - 1,673

All of the OpenCL tests were run on the morning of March 17, 2014 on two of my 2007 MacPro2,1s, running OSX 10.9.2. There was no crash even with the GTX 780 TI (it has the GK110B chip). Scores couldn’t be posted because Luxmark site was then down to the public.

These test were run by special request from Kennyman and are meant only for his eyes; so if you’re not Kennyman -
1) Don’t look at the scores. Keep your eyes shut - I can see that you’re peeping!
2) If you’ve already seen the scores because of my stealth placement of this caution, you must immediately forget what you’ve seen here. Beware - I have ESPN and can sense that your brain recall activity is being re-ignited!
3) If you've already looked at the scores and can’t forget about them, you must forever keep your mouth shut and you fingers frozen about the scores and NOT OTHERWISE REVEAL TO ANYONE ELSE WHAT YOU”VE SEEN HERE. That includes no disclosures to your special friends, significant others and pets, such as, but not limited to, the dog “Snitch” and the cat “LoseLips” and especially the parrot “Gossip.”

Tutor · Mar 17, 2014

New Dog on the corner - Xeon E7 4890 V2

A 4-CPU Xeon E7 4890 V2 (4x15-core) system [ http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon E7-4890 v2.html ] makes my WolfPackPrime0's score of 3,791 on Cinebench 15 look puny. The new dog scores 5,818 [ http://www.cbscores.com ]. However, it looks like New Dog's proud owner may have paid almost as much for one of those CPUs as I paid entirely for each of my 32-core systems. Moreover, I've got other dogs in the pen and they've special abilities - CUDA.

Further thoughts - What if NewDog's owner turns it into a Hackintosh? That'd be one fast Mac. But would he need a cheese grater or a very large cylinder to complete the transformation fully by appearance?

fiatlux · Mar 17, 2014

Tutor said:
A 4-CPU Xeon E7 4890 V2 (4x15-core) system [ http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon E7-4890 v2.html ] makes my WolfPackPrime0's score of 3,791 on Cinebench 15 look puny. The new dog scores 5,818 [ http://www.cbscores.com ].

What about an 8-CPU Xeon E7 8890 v2 system?

At 6000$+ per CPU, that's a lot of nMP for the price....

Tutor · Mar 18, 2014

It's an eeking shame, for lust for big iron drives me insane.

Is there salve for my pain?

fiatlux said:
What about an 8-CPU Xeon E7 8890 v2 system?

At 6000$+ per CPU, that's a lot of nMP for the price....

fiatlux, I wanted salve, not more pain!

The barebones chassis, alone, for four of the 4800 E7s is $6,200+ ( https://www.wiredzone.com/Supermicr...Tower-4U-f--4x-Xeon-E7-4800-v2~10023263~0.htm ). That's just a few dollars shy of my full cost for one of my 32-core systems fully decked out.

that's 6x eeks - 4 eeks for the CPUs, an eek for the barebones chassis and another eek for the ram cost. It's an eeking shame. If we were to head to full discussion of the full costs, remember that we haven't even eeked out the price of the OS for the high end 4800s, of a fully loaded E7-8890 v2 system, we'd be surely eeked out.

Second thoughts: Maybe I should have put a "g" before all of my eeks, for the performance potential does make it hard for me to stop salivating and to sleep. Rich and generous uncle, you haven't yet offered a peep? Come to think of it, there's another way for me to have that massive compute potential for way less pain and on the cheap. Major Pain - be gone, I've got exploration and destiny to meet.

Tutor · Mar 19, 2014

First results in linking exploration and destiny for maximizing 3d rendering.

The pic below shows the most recent Cinebench 15 score for one of my 2007 MacPro2,1s (the same one used in post 978, above) that previously topped out at a score of 575. Can innovate my ***! The delta is CUDA GPU driven.

P.S. But its not anywhere near the performance improvement multiple that I was expecting - I am expecting a Cinebench 15 score of greater than 17,000*/. If I can get this fully working on my Mac Pros, my 8 GPU Tyan server will be next. More to come.

*/ Here's why.

Just How Fast Are 2x780 Ti SC ACX + 1xGalaxy4gGTX680 in Cinebench 15 GPU Measure?

To get a good feel for the answer to this question I did the following:

1) I ran Cinebench 15 rendering benchmark on a 2007 MacPro2,1 loaded with 2xGTX780 Ti SC ACX + 1xGalaxy4gGTX680 and achieved a score of 575 - the render took 1 min 13 sec. or 73 seconds.

2) I fired up Cinema4d, then open/loaded the cpu.c4d file inside the Applications/Cinebench_15/ plugins/bench/cpu folder. I then set the focus of the textures folder underneath in my Cinema 4d preferences path = Edit/Preferences - Files - Texture Paths.

3) I then I pressed the Render View button and 1 min. 30 sec. or 90 seconds later I was greeted with a rendering of the scene displayed by Cinebench 15 when it completes the benchmark test; but my tendering was bigger. I use a Sony 48 inch TV connected to my MacPro2,1 via HDMI, set to a scaled resolution of 1080p.

4) I highlighted/selected all of the textures in the material viewer and selected the function - Plugins- C4dOctane - Convert Materials - to convert the C4d textures to ones compatible to OctaneRender.

5) I selected the function - Plugins- C4dOctane - Live Viewer Window and in the Octane Render for Cinema 4d Final 1.01 API 1.2 panel I selected the button to send my scene to the Octane Live Viewer.

6) I had to drag a couple of textures to their appropriate destinations.

6) Then I resized the Octane Live Viewer to the same size as the Cinebench 15 rendering benchmark’s view window and clicked on the Start Your Render button on the Octane Render for Cinema 4d Final 1.01 API 1.2 panel. 2 sec. later the render was done by Octane.

7) Next I resized the Octane Live Viewer to the same size as the Cinema4d 15 viewer window. 3 sec. later the render was done by Octane.

8) So to sum this up:

A) A render that my MacPro2,1 took 73 seconds to render to get a Cinebench 15 score of 575, OctaneRender achieved in 2 sec.
[73 / 2 = 36 times faster] {36 x 575 = 17,250; 17,250 is over 4.5 times [17,250 / 3,791 = 4.55025059351095] the 3,791 score achieved by my 32-core WolfPackPrime0}.

B) A render that my MacPro2,1 took 90 seconds to render, OctaneRender achieved in 3 sec. [90 / 3 = 30 times faster].

BTW- That Cinebench benchmark scene has more objects that don't even get seen when you render the benchmark.

deppest · Mar 20, 2014

Very interesting comparison between Octane GPU and Cinema 4d CPU render capabilities. While one probably could discuss ad lib about the comparability of render settings and output quality it provides an impressive ballpark figure.

As a regular user of Maxwell render which is another non-biased but CPU based renderer I am aware that these renders are never quite finished but progress reducing the noise in the image until they're told to stop. What kind of stopping point did you use in Octane (e.g. no of samples/pixel) and was the choice of stopping point based on the achieved image quality? Would you mind posting the two final renders for a comparison?

I may try the same exercise in Maxwell when I find the time. But I already know it's gonna take longer than 3 sec...!

Tutor · Mar 20, 2014

deppest said:
... . What kind of stopping point did you use in Octane (e.g. no of samples/pixel) and was the choice of stopping point based on the achieved image quality? Would you mind posting the two final renders for a comparison? ... .

I used 800 samples for the Cinebench render. I didn't save any pics from the tests mentioned in the P. S. if that is what you make reference to. When I run the tests again, I'll save pics for you to look at.

riggles · Mar 24, 2014

Tutor, have you encountered any performance issues running multiple ACX cooler design cards in your MPs? I'm interest in the new 780 6GB cards that are coming out, but they're ACX designs. I was under the impression that since the PCI compartment in the MP only has one fan in the front, the reference blower card design were a better idea. I'm worried about running two ACX cards and really increasing internal case temperatures.

Edit: I insist on running with the door closed, FYI.

sirio76 · Mar 24, 2014

Tutor said:
I am expecting a Cinebench 15 score of greater than 17,000

Sorry my friend but this expectation are a bit unrealistic

Yes, GPU render is fast, but only for small scene(like the one in Cinebench), there is a reason why in every demo video of Octane(or other GPU renderer) it just show a very simple setup like a car, a few object or a basic indoor/outdoor scene.
If you work on simple stuff like product shots etc, than it can be a good/fast option, but as soon as you begin to throw in your scene million polygons, lot of big textures, many lights, etc, you will find that GPU is still very far behind a modern CPU renderer(biased or unbiased), for that reason nobody in production industry is using that for final shots.
Just browse on Otoy Octane forum gallery to see real work(not simple benchmark scene) with real render time(hardware specs in signature), just few examples:
https://render.otoy.com/forum/viewtopic.php?f=5&t=38660
https://render.otoy.com/forum/viewtopic.php?f=5&t=38648
https://render.otoy.com/forum/viewtopic.php?f=5&t=38505
https://render.otoy.com/forum/viewtopic.php?f=5&t=38641
https://render.otoy.com/forum/viewtopic.php?f=5&t=38400
Don't know about you but I'm really not impressed, everyone with a decent modern xeon machine can match or exceed that in term of speed/quality.
Is there a chance to see some of your scenes?

echoout · Mar 24, 2014

sirio76 said:
Just browse on Otoy Octane forum gallery to see real work(not simple benchmark scene) with real render time(hardware specs in signature)?

Scenes with fewer objects aren't "real work"? I have no interest in photoreal scenes like the ones in the examples. I use Octane for motion graphics elements. On my last project, Octane cut my frame renders down from 4:30 to :15. That makes the difference in the project getting done on-time or not.

Tutor · Mar 24, 2014

riggles said:
Tutor, have you encountered any performance issues running multiple ACX cooler design cards in your MPs? I'm interest in the new 780 6GB cards that are coming out, but they're ACX designs. I was under the impression that since the PCI compartment in the MP only has one fan in the front, the reference blower card design were a better idea. I'm worried about running two ACX cards and really increasing internal case temperatures.

Edit: I insist on running with the door closed, FYI.

I too run mine with the doors closed. Running three 780 Ti SC ACXs has not made the PCIe compartment significantly hotter. In fact, the overall internal case temperature is about the same as it was before I went from one GTX 680 OC. However, I have used SMC fan control (SMCfc) throughout (since about 2009) to tame the internal temps of all of my Mac Pros. Like Brylcreem, just a little dab of SMCfc will do ya.

sirio76 · Mar 24, 2014

@Echoout
Of course even a scene with 10 polygons can be a real work scene if that's what you are doing, and in that case GPU renderer can be a great solution. I was just saying that as soon as you rise scene complexity this solution will soon become slower than classic CPU renderer, so if you are expecting a Cinebench score of 17.000 points just because you are on GPU you will be most likely disappointed, unless you work with very simple setup

Let's say that GPU renderer speed strongly depends on what you are working on, claiming a general 34x speed increase is not very fair

Tutor · Mar 24, 2014

Tutors Premiere Tutor

sirio76 said:
Sorry my friend but this expectation are a bit unrealistic

If this expectation is a bit unrealistic, I promise you it is not my first one, nor shall it be my last one, for an old wise woman, who shall be my consummate tutor for so long as I inhabit this shell, said, more than once, to me while rearing me:
"Son, set your goals like this door frame [while pointing to it]. Set them to the top [from the floor] and should you get only half way there, your level of achievement would be just above the door handle and you’ll feel that you still have a long way to go to reach success, in your mind. But set them to the top of the baseboard [from the floor] and should you get only half way there, you will not have far to go to reach success, in your mind. But while you might not like how far you have to go if you've set your goals very high, even when you get only part of the way there, some will look, with amazement, at what you’ve done, though inside you might feel like a complete failure.”

sirio76 said:
Yes, GPU render is fast, but only for small scene(like the one in Cinebench), there is a reason why in every demo video of Octane(or other GPU renderer) it just show a very simple setup like a car, a few object or a basic indoor/outdoor scene.
If you work on simple stuff like product shots etc, than it can be a good/fast option,

I agree that demos for OctaneRender, as well as those for other GPU and CPU renderers, tend to use small scenes. The reason that I refer to demo scenes in this thread is because others can more easily replicate testing, given the public availability and relatively small download times for such scenes.

sirio76 said:
... .but as soon as you begin to throw in your scene million polygons, lot of big textures, many lights, etc, you will find that GPU is still very far behind a modern CPU renderer(biased or unbiased), for that reason nobody in production industry is using that for final shots.
Just browse on Otoy Octane forum gallery to see real work(not simple benchmark scene) with real render time(hardware specs in signature), just few examples:
https://render.otoy.com/forum/viewtopic.php?f=5&t=38660
https://render.otoy.com/forum/viewtopic.php?f=5&t=38648
https://render.otoy.com/forum/viewtopic.php?f=5&t=38505
https://render.otoy.com/forum/viewtopic.php?f=5&t=38641
https://render.otoy.com/forum/viewtopic.php?f=5&t=38400

Video ram size limitations on current GPUs [as compared with the large amounts of system memory that I have and can add to my clock tweaked 4, 6, 8, 12, 16 and 32 core machines] does require an alteration in the workflow to get the most from GPU rendering technology. My tool chest that I use for home related repairs, does not rely on just one screw driver or one saw or one hammer, etc. I use the appropriate tool for the needed repair. Likewise, my tool chest for artistic work relies on many tools whose functions are similar, but specific ones are now best positioned for the job at hand. Some jobs, or maybe just parts thereof, are best accomplished by CPU rendering, some by GPU rendering and yet others by a combination of both technologies.

sirio76 said:
Don't know about you but I'm really not impressed, everyone with a decent modern xeon machine can match or exceed that in term of speed/quality.

Whether one's impressed by GPU rendering varies. I, for one, am impressed by what it can do now with OctaneRender, Thea Render [ http://www.thearender.com/cms/ ] and Redshift [ https://www.redshift3d.com/products/redshift/ ], and even more so by potential uses for GPU rendering as GPUs are accompanied by greater amounts of ram. Also, I'm impressed by developers like the folks at Redshift who have developed:

"Out-of-Core Architecture - Redshift uses an out-of-core architecture for geometry and textures, allowing you to render massive scenes that would otherwise never fit in video memory. A common problem with GPU renderers is that they are limited by the available VRAM on the video card – that is they can only render scenes where the geometry and/or textures fit entirely in video memory. This poses a problem for rendering large scenes with many millions of polygons and gigabytes of textures. With Redshift, you can render scenes with tens of millions of polygons and a virtually unlimited number of textures with off-the-shelf hardware."

Maybe I'm more easily impressed than are most. Different strokes for different folks.

sirio76 said:
Is there a chance to see some of your scenes?

Very soon, as I consummate another phase of my worldly transformations, you may find that you can see more of my scenes than you'd like to see, but they most likely will not be posted in this thread. However, the one thing that living in this shell for more than 60 years has taught me, is to never say, "never." My aspirations here are about maximizing CPU/GPU related performance and thus I'd prefer to deal only with publicly available benchmark scenes that allow others to easily replicate what I do and to make comparisons.

----------

sirio76 said:
@Echoout
Of course even a scene with 10 polygons can be a real work scene if that's what you are doing, and in that case GPU renderer can be a great solution. I was just saying that as soon as you rise scene complexity this solution will soon become slower than classic CPU renderer, so if you are expecting a Cinebench score of 17.000 points just because you are on GPU you will be most likely disappointed, unless you work with very simple setup
Let's say that GPU renderer speed strongly depends on what you are working on, claiming a general 34x speed increase is not very fair

This is a perfect example of why I prefer to use generally available benchmark scenes, although I apologize if I didn't make it crystal clear that what I was doing was done fully in the context of rendering a standard Cinebench 15 benchmark scene.

Tutor · Mar 24, 2014

Markets/projects do count.

echoout said:
Scenes with fewer objects aren't "real work"? I have no interest in photoreal scenes like the ones in the examples. I use Octane for motion graphics elements. On my last project, Octane cut my frame renders down from 4:30 to :15. That makes the difference in the project getting done on-time or not.

Great use of a tool. Like you, I also use GPU rendering for motion graphics elements for video applications. GPU rendering is not only sufficient, but is perfect for the vast majority of jobs that I see day to day.

sirio76 · Mar 24, 2014

All fair point my friend

You know I don't really like benchmarks, most time they won't tell the whole story.
I come from an old 2.1, CPU performance are far from impressive.
Now let's say I'm looking for more speed and I read your post.
17.000CB points???
I can think "hey! 34x speed increase, I'll buy a couple Titan tomorrow!!!".
I pay 2k for my new shiny cards, I install the two Titans, CUDA drivers etc.
Then I start to work and I realize that most of my scene struggle to fit in memory, even when they fit they take long to render because I work on indoor shots with many light, many millions polygons, displacement etc.
Quality is nice but I'll end up with a render completed in 80h(like the one I've see on Octane forum, which is quite the norm for me).
Then I think: hey Tutor! I want my money back!!!

Jokes aside, GPU rendering may work very well for somebody but it's not going to replace CPU anytime soon for many tasks, too many limitation/missing features at this time(at least for complex stuff). In the future I'm sure it will be a strong contender and will eventually become the best option

riggles · Mar 25, 2014

Tutor said:
I too run mine with the doors closed. Running three 780 Ti SC ACXs has not made the PCIe compartment significantly hotter. In fact, the overall internal case temperature is about the same as it was before I went from one GTX 680 OC. However, I have used SMC fan control (SMCfc) throughout (since about 2009) to tame the internal temps of all of my Mac Pros. Like Brylcreem, just a little dab of SMCfc will do ya.

Ok, good to know. 2 x 780 6GB w/ ACX should be fine then. I've never used SMCfc, and I'm honestly not sure what the best way to set it up would be.

Tutor · Mar 25, 2014

Tips for maintaining temperature status quo while increasing GPU compute performance.

riggles said:
Ok, good to know. 2 x 780 6GB w/ ACX should be fine then. I've never used SMCfc, and I'm honestly not sure what the best way to set it up would be.

As a small software utility it installs very easily. It gives you some temperatures for the important areas in your case and your can just drag a slider to your right to raise fan speed per area. If you install SMC Fan Control first, i.e., before you install the second GPU, then you can use SMCfc to monitor what your temperatures are now. Write those values down. Then after you've installed the second GPU (and maybe a third GPU like I did - see my earlier posts above about how to install three double wide GPUs in an early MacPro), if the temps are higher than before, then just slowly drag the slider to the right, a little at a time, until you achieve the same temperatures as those that you earlier recorded. I said "slowly and a little at a time" because it may take a minute or two for the increase in fan speed to achieve a temperature reduction. Lastly, name and save your settings. Voila! Your system temperatures are no higher than before.

leon771 · Mar 25, 2014

So how long before you start stuffing Titan Z's into your 2,1s?

http://blogs.nvidia.com/blog/2014/03/25/titan-z/

Tutor · Mar 25, 2014

I want only 20 of them - 12 for 4 MacPros and 8 for a Tyan Server.

leon771 said:
So how long before you start stuffing Titan Z's into your 2,1s?

http://blogs.nvidia.com/blog/2014/03/25/titan-z/

Leon, Thanks for the URL - It's making me drool. So, your question is how long before I start stuffing the first GPU designed for 5K graphics, outfitted with full 2xKepler GK110 processors, housing a total of 5,760 CUDA cores and 2x6gig vram, yielding 8 TeraFlops of GPU compute performance, for a measly $2999, into my 2,1s? Maybe, late this summer. My hope is that they're only double, not triple, wide. That way I can get three of them into each of my three 2007 MacPro2,1s, three of them into my one 2009->2010 MacPro5,1 (self-upgraded) and eight of them into my 8 GPU Tyan Server. My other 15 systems will just have to be satisfied with GTX oTitans, 780 Ti SC ACXs, SC 690s, SC 680s, SC 590s, SC 580s, and SC 480s. Let's see, that would take my oTitan (RD) Octane Rendering Total Equivalency (TE) to >100, or the compute equivalency of more than 1,000 single CPU E5-2687W V1 systems, but housed in 20 current, plus 2 more, cases. I'd like that.

P.S. Since I've given you until the end of summer to send to me the check for 60 grand (I'll cover taxes and shipping), aren't you happy that I can delay my upgrades until then?

leon771 · Mar 25, 2014

Late summer, not a bad timeline at all. I'm curious to see how performance scales and whether it's worth the $$$ investment.

Tutor · Mar 25, 2014

leon771 said:
Late summer, not a bad timeline at all. I'm curious to see how performance scales and whether it's worth the $$$ investment.

See my last edit, especially the P.S. That question we can answer as a team.

But joking aside, given the linearity of OctaneRender, the Z be twice as fast as two GTX 780 Ti SC ACXs. One GTX 780 Ti SC ACX renders 1.37 times faster than the (o) original reference design Titan. So the Z will be the render equivalent of 2.74 oTitans. That, along with the space savings and other improvements in design, temps, power consumption, etc., explains most of the price difference. In sum, it a fair price for the performance+.

riggles · Mar 25, 2014

Tutor said:
Leon, Thanks for the URL - It's making me drool. So, your question is how long before I start stuffing the first GPU designed for 5K graphics, outfitted with full 2xKepler GK110 processors, housing a total of 5,760 CUDA cores and 2x6gig vram, yielding 8 TeraFlops of GPU compute performance, for a measly $2999, into my 2,1s? Maybe, late this summer. My hope is that they're only double, not triple, wide. That way I can get three of them into each of my three 2007 MacPro2,1s, three of them into my one 2009->2010 MacPro5,1 (self-upgraded) and eight of them into my 8 GPU Tyan Server. My other 15 systems will just have to be satisfied with GTX oTitans, 780 Ti SC ACXs, SC 690s, SC 680s, SC 590s, SC 580s, and SC 480s. Let's see, that would take my oTitan (RD) Octane Rendering Total Equivalency (TE) to >100, or the compute equivalency of more than 1,000 single CPU E5-2687W V1 systems, but housed in just 20 cases. I'd like that.

Judging by the video release, pretty sure it's a triple-wide card. So, two at the max with some of your HDD bays removed. Even if you could fit three, I'm not sure how you'd power them all without a second aux PSU hack.

Personally, I'm not that enamored with the Titan Z. You get 2x the Titan Black performance for 3x the price. For the same amount of money, you could actually fit 3 x Titan Black's in your oMP and have 2,880 more CUDA cores working. Or have 3 x 780 6GB for half the price of a Titan Z and have 1,152 more CUDA cores. Seems like a better use of Octane rendering funds, no?

All We Know About Maximizing CPU Related Performance

macrumors 65816

macrumors 65816

Attachments

macrumors 65816

Attachments

macrumors 65816

macrumors 65816

macrumors 6502

macrumors 65816

macrumors 65816

Attachments

macrumors member

macrumors 65816

macrumors 6502

macrumors 6502a

macrumors 6502a

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors 65816

macrumors 6502a

macrumors 6502

macrumors 65816

macrumors regular

macrumors 65816

macrumors regular

macrumors 65816

macrumors 6502

Our Staff