All We Know About Maximizing CPU Related Performance

Tutor · Mar 17, 2013

Evga SR-2

CPU activation is jumper controlled on the EVGA SR-2 so that you can test and adjust/set your CPUs separately. Remember to always reactivate both CPUs when you are done.

DJenkins · Mar 17, 2013

Tutor said:
To DJenkins:

Comment - Tutor just throwing a question out there regarding the SR-2 before I hit the EVGA forums. I am having increasing troubles with BIOS resets again, very similar to when I first started and the battery needed replacing.

Q - DJenkins, what bios version are you using? Some are flakey overall and some are flakey unless you are running only Windows. Also, what OS version are you running. I'm running SL and its been rock solid for over two years.

Hey Tutor thanks for your reply. I have A56 BIOS running because it's the only one where my Revodrive X2 PCIe SSD with Windows OS is recognised.

I have read that A49 BIOS is most stable but I wouldn't be able to get into Windows for the torture test

It was always quite ambitious of me to have try and make my machine a 'do everything' box, I think I'm finding there's got to be compromise somewhere.

I currently have everything working on A56 and OSX 10.8 except BIOS resets. Sleep, firewire, raid card etc. are all good.

I honestly sort of gave up having a full time overclock because it was frustrating while tweaking settings and have it reset all the time.

The reason I asked about 6 core machines is because from what I have read they are more stable, and most things work directly on the motherboard without needing PCIe cards to compensate. Was thinking about getting a lower power 6 core machine for the everyday use and stripping the SR-2 back to OSX 10.6.x and really going for a decent overclock for the heavy lifting.

Also as you posted above I never used the CPU jumpers but the PCIe jumpers have got a serious workout in my machine! I really wanted to wire up switches in the front drive bay to make it easier than fiddling with tweezers all the time... but I'm not sure where to get connectors/headers that would suit and I'm scared of wrecking something by direct soldering!

PunkNugget · Mar 17, 2013

Hey Guys,

Just wanted to post something here that I posted on TonyMacX86.com.

You got that right. CS6 is the truly the way to go to use what's needed for Dual CPUs. It's amazing that Apple would allow this with FCP & FCPX - just lame. Also, to add to that list AVID is just as bad. I went back and forth on a few forums to find this out the hard way. Plus, if you're really wanting true rendering power, you're wasting your time with putting all the pressure on your CPU & RAM. Really where most of the pressure should be going is on the GPU; as it's supports CUDA and Mercury Playback.

the GTX 480, 580 and Nvidia Tesla C2070 are a video renderers DREAM COME TRUE !!! I was able to test render using

CS6 11.1 RAYTRACE BENCHMARK (google the name and download the file)

Well, here are my results:

EVGA 580 = 6:24+ min (I think)
EVGA 580 + C2070 = 4:54 min
EVGA 680 + C2070 = 5:42 min
EVGA 580 + EVGA 580 + C2070 = 3:11 min

Check out this page (it's already translated; if not then google translate it):

http://translate.googleusercontent....u.html&usg=ALkJrhiRFwHYCWfiEHtgq6N8IumPd9rqRQ

Funny thing is someone already did the VERY SAME TESTS I DID using the SAME SETUP; with an SR-2 Mobo w/2 x 5690's and 48GB of RAM. The render times were identical. So I wasn't off on my times (as I went by memory; with the exception of the last 2 tests which I wrote down).

So in the end just using your CPU, RAM and 480 or 580 (by the way, don't throw your money out the window buying a Quadro 4000 or 5000 when a 480 and 580 performs the same and better). This is old hat information anyway and many forums have been done on comparing these cards with each other coming up withe same result (that the 480 & 580 out school the Quadro 4000 & 5000 series). Now with using the Tesla C2070 or C2075 they work in tandem WITH the 480 and 580 MUCH BETTER. Using the 680 though doesn't work so well because Adobe didn't code that into the equation. Believe me RampageDev and I worked on that, adjusting the DSDT all night long to come up short each and every time. the Tesla just wasn't showing up using the 680. So in the end I went back to using the 580 and that's fine, because look at that difference just using the 580 by itself and using the 580 & C2070. But then using 2 x 580's & C2070, that (to me) is just amazing. Plus, you can move the cursor in AE in real time and actually plays it live right there. That Tesla C2070 is one sweet card!

PS - I also got the EK waterblock for the C2070 as well. Might as well take advantage of the water cooling setup I have anyway... hee - hee...

I've included one pic, but I'll be loading up my new pics of my new and improved HACKINBEAST on InsanelyMac soon. Later all...

Tutor · Mar 18, 2013

PunkNugget said:
Hey Guys,

Just wanted to post something here that I posted on TonyMacX86.com.

You got that right. CS6 is the truly the way to go to use what's needed for Dual CPUs. It's amazing that Apple would allow this with FCP & FCPX - just lame. Also, to add to that list AVID is just as bad. I went back and forth on a few forums to find this out the hard way. Plus, if you're really wanting true rendering power, you're wasting your time with putting all the pressure on your CPU & RAM. Really where most of the pressure should be going is on the GPU; as it's supports CUDA and Mercury Playback.

the GTX 480, 580 and Nvidia Tesla C2070 are a video renderers DREAM COME TRUE !!! ...

PunkNugget,

Sweet! Thanks for the update. I'm still deep into studying the ins and outs of CUDA so that I can take maximum advantage of it.

BTW - Given that you got:
EVGA 580 = 6:24+ min (I think)
EVGA 580 + C2070 = 4:54 min
EVGA 680 + C2070 = 5:42 min
EVGA 580 + EVGA 580 + C2070 = 3:11 min

That means that if 6:24 is accurate, and assuming linearity, that the addition of the C2070 shaved (6:24 - 4:54 = ) 90 seconds off that longest time and that the addition of another 580 shaved (4:54 - 3.11) 103 seconds off that 2nd fastest time. So the real winners are those GTX 580s, which is what I would have expected given that the EVGA versions of the GTX 580 tend to have greater CUDA core frequency, higher floating point peak, higher memory speed and greater memory bandwidth than do the Tesla cards.

PunkNugget · Mar 18, 2013

Tutor said:
PunkNugget,

Sweet! Thanks for the update. I'm still deep into studying the ins and outs of CUDA so that I can take maximum advantage of it.

BTW - Given that you got:
EVGA 580 = 6:24+ min (I think)
EVGA 580 + C2070 = 4:54 min
EVGA 680 + C2070 = 5:42 min
EVGA 580 + EVGA 580 + C2070 = 3:11 min

That means that if 6:24 is accurate, and assuming linearity, that the addition of the C2070 shaved (6:24 - 4:54 = ) 90 seconds off that longest time and that the addition of another 580 shaved (4:54 - 3.11) 103 seconds off that 2nd fastest time. So the real winners are those GTX 580s, which is what I would have expected given that the EVGA versions of the GTX 580 tend to have greater CUDA core frequency, higher floating point peak, higher memory speed and greater memory bandwidth than do the Tesla cards.

You always got a way about ya' don't you Tutor - LOL !!! You love stirring up stuff and then playin' innocent. But that's what I love about you - LOL !!!

Of course the 580's are going to be the better card (for now), but I wouldn't have been able to get the rendering time that I did WITHOUT the C2070. Remember they work in TANDEM with each other... I'll try the 2 x 580's without the C2070 tomorrow to see what happens...

I'm still laughing at your comment - LOL !!! I like you Tutor, you're cool people and always giving out the "quiet" challenges to be even that much more thorough in my testing. Who knows, you could be right...

Tesselator · Mar 18, 2013

PunkNugget said:
CS6 11.1 RAYTRACE BENCHMARK (google the name and download the file)

I get nothing except the other site you posted this on. How about a link?

Tutor · Mar 18, 2013

PunkNugget said:
... I'll try the 2 x 580's without the C2070 tomorrow to see what happens...

Assuming the same linearity Otoy says adheres to Octane when GTX cards are used, is universal, then if one EVGA GTS 580 performs the rendering test in 6:24 minutes, and one EVGA 580 + one C2070 performs the test in 4:54 min and one EVGA 580 + another EVGA 580 + one C2070 performs the test in 3:11 min, then the addition of the C2070 shaved (6:24 - 4:54 = ) 90 seconds off that longest time and the addition of another 580 shaved (4:54 - 3:11) 103 seconds off that 2nd fastest time.

Two or more CUDA cards (whether GTX and/or Tesla) should work in tandem if the CUDA code is written properly.

Therefore, a second EVGA 580's should perform the render test in about 4:41 min [6 * 60 sec/min + 24 sec = 384 sec using one GTX 580; a second GTX 580 should cut the render time by another 103 sec or 384 - 103 = 281 sec [281 / 60 = 4.68333333333333; 0.68 * 60 = 40.8; 4 + .408 =~ 4:41 min using 2 GTX 580s]].

Using a third GTX 580 should result in a time of under 3 minutes [384 - (103x2 or 206) = 178; 178 / 60 = 2.96666666666667; 0.97 * 60 = 58.2; 2 + .582 =~ 2:58 min using 3 GTX 580s].

If that 6:24 test result is accurate and your future test results using 2 or 3 GTX 580s differ significantly, then the linearity that Otoy says adheres to Octane when using GTX cards, may not be universal. That CUDA lacks or fails to maintain linearity universally is a important finding and adds to our body of knowledge about GPU-assisted compute performance.

PunkNugget · Mar 18, 2013

Tesselator said:
I get nothing except the other site you posted this on. How about a link?

You know I can't find the site now. The only site that pops up is some asian site, but when I try to Google Translate it only partially converts the language. The static images (jpeg buttons) won't translate. So I don't know which button to press and when I do, I think it's asking for your sign in info and since I'm not a member of that forum site and can't read Chinese, oh well... DARN YOU - GOOGLE !!!

Tutor said:
Assuming the same linearity Otoy says adheres to Octane when GTX cards are used, is universal, then if one EVGA GTS 580 performs the rendering test in 6:24 minutes, and one EVGA 580 + one C2070 performs the test in 4:54 min and one EVGA 580 + another EVGA 580 + one C2070 performs the test in 3:11 min, then the addition of the C2070 shaved (6:24 - 4:54 = ) 90 seconds off that longest time and the addition of another 580 shaved (4:54 - 3:11) 103 seconds off that 2nd fastest time.

Two or more CUDA cards (whether GTX and/or Tesla) should work in tandem if the CUDA code is written properly.

Therefore, a second EVGA 580's should perform the render test in about 4:41 min [6 * 60 sec/min + 24 sec = 384 sec using one GTX 580; a second GTX 580 should cut the render time by another 103 sec or 384 - 103 = 281 sec [281 / 60 = 4.68333333333333; 0.68 * 60 = 40.8; 4 + .408 =~ 4:41 min using 2 GTX 580s]].

Using a third GTX 580 should result in a time of under 3 minutes [384 - (103x2 or 206) = 178; 178 / 60 = 2.96666666666667; 0.97 * 60 = 58.2; 2 + .582 =~ 2:58 min using 3 GTX 580s].

If that 6:24 test result is accurate and your future test results using 2 or 3 GTX 580s differ significantly, then the linearity that Otoy says adheres to Octane when using GTX cards, may not be universal. That CUDA lacks or fails to maintain linearity universally is a important finding and adds to our body of knowledge about GPU-assisted compute performance.

Now to address Tutor (and anyone else reading this), as I was able to (before) disconnect the C2070 to test the other cards out, I am not able to do that now; as it won't get past the Apple icon when I boot it. This is mainly because of the custom DSDT file that Rampage set up for me. Even though I have the other DSDT files that would enable me to make that happen (as I keep everything anyway), I'm not going to mess with that. So with that said, you're just going to have to be good with the through testing that I've done (with Mac OS X) as well as what the other russian guy performed in his testing (in Windows 7), as they were almost identical in their rendering times.

On the surface of your calculations, I hear you on what you estimated, but from working with Rampage on the C2070 (as he has servers that are dedicated using stacks of these very same cards using CS6 AE & PR when it comes to video rendering) as well as our personal tests using the 580 and 680, I'm going to stick with his knowledge base on that one for now, when it comes to CUDA and LIVE Mercury Playback.

I do have to say something about Rampage. I appreciate what he does is he'll block off the time and we'll Skype each other live to bang out the work until we're done as he knows my talents with understanding the programming end of things are really limited to just the building part of PCs, as his talent is the programming part of PCs. So I'm grateful for that LIVE help when we're able to do it. We both really got the satisfaction of working together and I feel we were (and still are) a great team in making that happen. So for that I'm happy to know that FINALLY someone was able to give all the extended help that was needed (and then some) to make that happen. Again, I'm VERY GRATEFUL to him. I could have not done that without his direct help

He really is amazing though for a young guy that not only has his own site that shows you (with step by step instructions and pics included) how to create your own DSDT and like kind type files, he'll actually go the extra mile and give you premium technical support (for a small donation) to get you up and running in no time flat. Had I met him 18 months ago I'd would have been up and running MUCH FASTER instead of the fumbling and stumbling (and arguing with a few people to get their assistance). I'm glad he offers both the opportunity to learn it yourself AND doing it for you (unlike someone else that we know that I bantered back and forth with on InsanelyMac that wasn't very helpful) Tutor knows the guy I'm talking about.

Which brings me back to my original point with that guy (on InsanelyMac), not everyone can have the same talents. Like Tutor for example, He's great when it comes to his Underclocking, some programming and whatever calculative skills he likes dropping on us (like what he just did on the above reply - LOL !!!). Rampage is great when it comes to getting your install in order with his customized DSDT, and other important key files, as well as streamlining your Extra folder to get you the best possible running system you can have. Me on the other hand, well I'd like to think that I'm a good, clean and tight professional PC builder as well as a good networker of sorts with others on various forums to get people connected with each other so their needs can be met. At least that's the feedback that I've been receiving from others that have told me this directly.

Again, not everyone can do EVERYTHING, and quite honestly I'm okay with that (as I have been for quite some time). As a result I've had quite a few guys that have the same (or similar) SR-2 build that have needed some help to get their system either up and running, or tweaked to max out their performance; where they too don't have the programming wherewithal or understanding that a guy like RampageDev and Tutor does. I'm just glad that I get to be used as a "vessel" (so to speak) so they don't have to go through the same MONTHS of headaches like I did to get their system running at their best.

Now, the first thing I do when someone reaches out to me for help is tell them to go to RampageDev - FIRST over anyone else (as well as voluntarily tossing him some duckets to help him out; as I think that's a very fair thing to do and he's worth EVERY PENNY). Then I have them go to other sites, like this one (which I call "Tutor's Way"), and posts like his to go about maxing out their performance by way of either Underclocking or Overclocking.

Personally, I still like Underclocking better, because it confuses the heck out of people as it does the opposite of what we've been taught and actually PERFORMS FAR BETTER, (and I know that Tutor gets a kick out of that, being that he's the guy that discovered it). But since I have 10.8.3 up and running now and not 10.6.8 anymore, as well as most of my performance is being used by the GPUs anyway, I'm going to stick with what I have for time being. I'm sure (if that time comes around) where UC'ing is achievable by having 10.8.X or above in the future, Tutor will be the man who'll bring that to light so we can ALL benefit from it.

I'm still surprized that the EVGA SR-2 Mobo setup has been the KING of Hackintosh performance for a OC'able Dual CPU setup for 3 years+ and still (I believe) has another 3 years to go. Tutor, BrainDeadMac, Peconi, mainHybrid, mike d, myself and a few others that I know of have benefitted from this amazing SR-2 Mobo and still are. There is so much expandability to this Mobo it's just SICK!!! I'm just glad that I got to be a part of something this great and was also even that much more grateful for being the reason why others wanted to build the same system. It's an awesome feeling !!! So thanks for compliments and hope to see more builds like this one...

Lastly, I know I went off on some tangents here and there, and I've confused anyone or given TMI, then please forgive me Later...

Tesselator · Mar 18, 2013

PunkNugget said:
You know I can't find the site now. The only site that pops up is some asian site, but when I try to Google Translate it only partially converts the language. The static images (jpeg buttons) won't translate. So I don't know which button to press and when I do, I think it's asking for your sign in info and since I'm not a member of that forum site and can't read Chinese, oh well... DARN YOU - GOOGLE !!!

Damn! Well, thanks for trying bro.

Anyone know or wanna temporarily host it themselves?

DJenkins · Mar 19, 2013

PunkNugget said:
I'm just glad that I got to be a part of something this great and was also even that much more grateful for being the reason why others wanted to build the same system. It's an awesome feeling !!! So thanks for compliments and hope to see more builds like this one...

Yep, it's quite a tight-knit community in which the boundaries have been pushed largely due to the generosity of several key players on here.

If I didn't have you guys to rant to about my problems this machine would have been out the window long ago

So I have switched back to A49 BIOS to see if I get any more BIOS resets. Is that what you guys are all still using?

My thought that it might be the battery again seems unlikely if Tutor says they can last several years or more. Unless my board is faulty and somehow chewing through batteries?! Just thought it was strange that A56 was stable for a few months then all of a sudden started resetting again.

One problem though... as I posted before I have Windows on a PCIe SSD that doesn't show up when using A49. Meaning if I go in and have a crack at OCing again I can't get into Windows for the torture test

Any way around this that you can think of? Should I temporarily boot a clone of Windows from a different drive?

I also noticed that usual Geekbench scores at stock settings (3.2GHz) under A56 of 24,XXX went up by around 2,000 points to mid 26,XXX under A49. No other changes except flicking the BIOS switch...

All this GPU info from PunkNugget is great as well! Really shows that if you know your goals you can build a machine to strictly cater to them. For example you may not need to push the limits of overclocking if CUDA is your tool, seems better to install more GTX580 GPUs instead. If I spent more time on strictly 3D work then these paired up with the Octane renderer would be very tempting.

However it doesn't sound like these CUDA features will make regular After Effects or Cinema 4D renders any faster, meaning for me CPU power still has a strong foothold in the game. I don't use the Raytracting features at all in After Effects, so an overclocked 12 core render still sounds like the way to go for now

DJenkins · Mar 19, 2013

Tesselator said:
Damn! Well, thanks for trying bro.

Anyone know or wanna temporarily host it themselves?

This link Punknugget posted does go to a translated site but I didn't see the project file...
http://translate.googleusercontent.c...q6N8IumPd9rqRQ

I do recall these tests or similar being conducted here over on creative cow:
http://forums.creativecow.net/thread/2/1019643#1019659

Which link to here as well:
https://docs.google.com/a/juansalvo...UawCfDQudEhQYWJsbUtRSVktZlFIR2xsSzFXMUE#gid=1

Hope it helps!

Tutor · Mar 19, 2013

DJenkins said:
...
However it doesn't sound like these CUDA features will make regular After Effects or Cinema 4D renders any faster, meaning for me CPU power still has a strong foothold in the game. I don't use the Raytracting features at all in After Effects, so an overclocked 12 core render still sounds like the way to go for now

If you don't use rays, then you'll still have to rely on multiCPUs. But for those who do use rays:
Octane Standalone works with Cinema 4d and has been doing so for quite some time. Currently, you have to use a community developed free scene exporter and then import it into Octane. What is in beta is a plugin to allow rendering to occur within Cinema 4D (and this is so for some other apps as well) without having to export, then import, i.e., it better integrates and gives the appearance of just being one app. These plugins are in various stages of development for almost every 3d app with any significant following. E.g., Maya, 3d Max, Lightwave 3d, Poser and Archicad plugins have been released; Revit and Blender previews have been done; Cinema 4D and Softimage beta testers have been sought.

As for Adobe CS6 AE, Nvidia's guide (Oct 2012 version - http://www.nvidia.com/docs/IO/123576/nv-applications-catalog-lowres.pdf ) to popular gpu-accelerated applications states that CUDA currently speeds up the 3d ray tracing engine in AE by 27 times. Since AE is multi-gpu aware, that 27x figure isn't a cap. Also, keep in mind that many AE video plugins are being infused with CUDA awareness and power - See the guide mentioned, above.

Which cards to buy?
The way Octane has been coded to handle CUDA is exemplified by the following statement from their from one of their FAQ's from their website: "If you are interested in purchasing a new graphics card to use with Octane Render, the Geforce GTX570 or GTX 580 currently have the best Performance to Price ratio. The latest generation of Nvidia GPUs (Kepler) is supported, but currently works slower than their Fermi equivalents. We are still optimizing the performance of Octane on the Kepler GPUs. The GeForce line is higher clocked and renders faster than Quadro and Tesla GPUs, but the latter GPUs often have more memory. A powerful multi-core CPU is not required as Octane does not use the CPU for rendering, but a faster CPU will improve the scene voxelizing speed." Moreover, Octane's manual states:
OctaneRender™ runs best on Fermi (e.g. GTX 480, GTX 580, GTX 590) and Kepler (e.g. GTX 680, GTX 690) GPUs, but also supports older CUDA enabled GPU models. GeForce cards are fast and cost effective, but have less VRAM than Quadro and Tesla cards. OctaneRender scales perfectly in a multi GPU configuration and can use different types of Nvidia cards at once e.g. a GeForce GTX 260 combined with a Quadro 6000. The official list of NVIDIA CUDA enabled products is located at https://developer.nvidia.com/object/cuda-gpus.

Tutor · Mar 19, 2013

PunkNugget said:
You know I can't find the site now. The only site that pops up is some asian site, but when I try to Google Translate it only partially converts the language. The static images (jpeg buttons) won't translate. So I don't know which button to press and when I do, I think it's asking for your sign in info and since I'm not a member of that forum site and can't read Chinese, oh well... DARN YOU - GOOGLE !!!

Now to address Tutor (and anyone else reading this), as I was able to (before) disconnect the C2070 to test the other cards out, I am not able to do that now; as it won't get past the Apple icon when I boot it. This is mainly because of the custom DSDT file that Rampage set up for me. Even though I have the other DSDT files that would enable me to make that happen (as I keep everything anyway), I'm not going to mess with that. So with that said, you're just going to have to be good with the through testing that I've done (with Mac OS X) as well as what the other russian guy performed in his testing (in Windows 7), as they were almost identical in their rendering times.

On the surface of your calculations, I hear you on what you estimated, but from working with Rampage on the C2070 (as he has servers that are dedicated using stacks of these very same cards using CS6 AE & PR when it comes to video rendering) as well as our personal tests using the 580 and 680, I'm going to stick with his knowledge base on that one for now, when it comes to CUDA and LIVE Mercury Playback.

… Later...

Thanks for your thorough testing. I regret (1) that you wouldn't have been able to get the rendering time that you did WITHOUT the C2070 and (2) that you were unable to disconnect the C2070 to test the other cards out as you had earlier stated that you would: "I'll try the 2 x 580's without the C2070 tomorrow to see what happens... ."

Luckily, I finally was able to get that result page translated [ http://translate.google.com/transla....efxi.ru/more/after_effects_cs6_test_gpu.html ]. I'm good with what the result page shows. What it confirms is that which I thought (although my ratio analysis calculations were far off the mark), namely that fast GTX 580's are the fastest combo, with just two of them, taking 3 Min 5 sec vs. your [as well as that russian guy's] 3 min 11 sec for two GTX 580's and the Telsa C2070 - Compare both pics below, as well as the relative speeds of the GTX 580s in the first pic. What does the Tesla C2070 working in TANDEM with the two GTX 580's add to the mix?

Tutor · Mar 19, 2013

Some more conjecture - maybe a little insight: Do programmers exercise common sense?

Just as Otoy has stated, "We are still optimizing the performance of Octane on the Kepler GPUs," I suspect that CUDA apps that are written to take advantage of the particular biases or features of a particular GPGPU line will be faster for that line than on others. E.g., with the Tesla cards (where double precision float performance peak = 1/2 of single precision float peak), apps can be more double precision biased and will perform much better on Tesla cards that on the GTX line. For the GTX 500 Fermi lineup, double precision float performance peak = 1/8 of single precision float peak - so I'd expect apps targeted to this audience to be more single precision biased (GTX 500's single precision float performance is about 150% of that of the Tesla C series) and even more so for an app that has been or will be optimized for the GTX 600 Kepler GPUs, where the double precision ratio is much lower, but the single precision float peak is the highest its been in an enthusiast line. So should there be any surprise when an app or even a test for an app ends up not behaving how we'd expect it to behave if we were in a performance sterile environment? Should programmers be allowed to make their companies' apps run better on the GPUs that their companies project that their user bases have or have easy access to? Can greed and common sense marry? Or the ultimate question we might be asking ourselves when it comes to choosing a CUDA card is - what CUDA card is my software designed to favor? Currently for Octane it's the GTX line and currently the 500 series, but who knows what some math/programming whiz at Otoy can do to shift the tables from Fermi more to Kepler. See also [ http://docs.nvidia.com/cuda/kepler-tuning-guide/index.html ].

BTW - For the GTX 600 Kepler lineup, double precision float performance peak = 1/24 of single precision float peak. So its double precision performance leaves a whole lot to be desired. Even the GTX 480 is faster than the GTX 680 at double precision floating point tasks.

Rampage Dev · Mar 19, 2013

Reviewing what you said about rendering times you would have to render a test file that was a 20 min video then render it. It would talk several hours but you would be about to get more precise calculations on how each added card added performance. On short benchmarks you will be unable to see the true time save over one card or the other when using several cards.

Tutor · Mar 19, 2013

Rampage Dev said:
Reviewing what you said about rendering times you would have to render a test file that was a 20 min video then render it. It would talk several hours but you would be about to get more precise calculations on how each added card added performance. On short benchmarks you will be unable to see the true time save over one card or the other when using several cards.

Thanks Rampage. That sounds completely reasonable. Have you done or seen enough evaluations of Adobe CS6 After Effects and/or Premiere's CUDA performance in real world settings to offer us an opinion on whether either or both of them are coded to take better advantage of (a) Tesla C2070/C2075's superior double precision floating point performance and greater ram, (b) the GTX 580's superior memory speed, bandwidth and single precision floating point performance over those two Tesla cards, (c) the GTX 680 or Titan's phenomenal single precision floating point peak performances and CUDA core clock and memory speeds, massive core counts - and in the case of Titan rivaling that of the Tesla K20X, and more abundant memory than other GTXs or (d) a specific combination of the above? Opinion based on real world applications would surely help us make wiser purchase decisions if our work regularly involves using either or both of those two applications.

PunkNugget · Mar 19, 2013

Tutor said:
Thanks Rampage. That sounds completely reasonable. Have you done or seen enough evaluations of Adobe CS6 After Effects and/or Premiere's CUDA performance in real world settings to offer us an opinion on whether either or both of them are coded to take better advantage of (a) Tesla C2070/C2075's superior double precision floating point performance and greater ram, (b) the GTX 580's superior memory speed, bandwidth and single precision floating point performance over those two Tesla cards, (c) the GTX 680 or Titan's phenomenal single precision floating point peak performances and CUDA core clock and memory speeds, massive core counts - and in the case of Titan rivaling that of the Tesla K20X, and more abundant memory than other GTXs or (d) a specific combination of the above? Opinion based on real world applications would surely help us make wiser purchase decisions if our work regularly involves using either or both of those two applications.

I tend to agree with Rampage on this one, as I believe longer render times can change the equation, but if you want to be sure on this one, you can get 3 x GTX 580 3GB GPUs and provide the results for us using CS6 AE and PR on a sample render file that I'm sure you can create and test with.

Lastly, I just found the CS6 11.1 RAYTRACE BENCHMARK for download here:

http://ppbm6.com/Instructions.html

You can also go through the rest of the site to get more info.

Tutor · Mar 19, 2013

PunkNugget said:
I tend to agree with Rampage on this one, as I believe longer render times can change the equation, but if you want to be sure on this one, you can get 3 x GTX 580 3GB GPUs and provide the results for us using CS6 AE and PR on a sample render file that I'm sure you can create and test with.

Lastly, I just found the CS6 11.1 RAYTRACE BENCHMARK for download here:

http://ppbm6.com/Instructions.html

You can also go through the rest of the site to get more info.

PunkNugget, thanks for the download information. Although I too tend to agree with Rampage that a real world project evaluation would be better and although I have more than enough GTX 580s to render a large file, I do not have any Teslas for comparison. That's why I asked Rampage for his opinion if he's done or seen any evaluations using large files that he considers to meet the definition of real world project size.

PunkNugget · Mar 19, 2013

Tutor said:
PunkNugget, thanks for the download information. Although I too tend to agree with Rampage that a real world project evaluation would be better and although I have more than enough GTX 580s to render a large file, I do not have any Teslas for comparison. That's why I asked Rampage for his opinion if he's done or seen any evaluations using large files that he considers to meet the definition of real world project size.

How about this, since we both have a similar SR-2 setup already, then all you have to do is just take 3 x 580 GTX 3GB GPUs that you have and lets find a more REALISTIC longer test render for AE and/or PR and do the test. My system is already setup with my C2070 & 2 x 580 GTX 3GB GPUs. I think there's a multi layered CS6 PR file that I downloaded that we can both test from. Lets make this thing happen, what do you say?

Tutor · Mar 19, 2013

PunkNugget said:
How about this, since we both have a similar SR-2 setup already, then all you have to do is just take 3 x 580 GTX 3GB GPUs that you have and lets find a more REALISTIC longer test render for AE and/or PR and do the test. My system is already setup with my C2070 & 2 x 580 GTX 3GB GPUs. I think there's a multi layered CS6 PR file that I downloaded that we can both test from. Lets make this thing happen, what do you say?

PunkNugget, I say, "Where can I download the file?" We should also publish the specs of our respective cards and systems. I will not be able to run the test until this weekend because my SR-2s are working on a project as we speak.

Rampage, this still doesn't get you off the hook if you have valuable, evaluative information to share with us.

PunkNugget · Mar 19, 2013

Tutor said:
I say, "Where can I download the file?" We should also publish the specs of our respective cards and systems. I will not be able to run the test until this weekend because my SR-2s are working on a project as we speak.

That's fine. I think all you have to download is the file that I have linked already. I think it's specifically for CS6 PR as I thought it was for AE. I'll make the correction on that...

----------

PunkNugget said:
I tend to agree with Rampage on this one, as I believe longer render times can change the equation, but if you want to be sure on this one, you can get 3 x GTX 580 3GB GPUs and provide the results for us using CS6 AE and PR on a sample render file that I'm sure you can create and test with.

Lastly, I just found the CS6 11.1 RAYTRACE BENCHMARK for download here:

http://ppbm6.com/Instructions.html

You can also go through the rest of the site to get more info.

My bad, the file that I linked was actually for CS6 PR and NOT for AE. But it's STILL a great file to have for test rendering...

Lastly, we should really do everything on stock speeds for a more comparable test. BTW, I do have another GTX 580 3GB Hydro GPU, but the way I have everything setup (as you can see from the pic), there is NO WAY I can take out these cards without a LOT OF WORK.

Tutor · Mar 19, 2013

PunkNugget said:
That's fine. I think all you have to download is the file that I have linked already. I think it's specifically for CS6 PR as I thought it was for AE. I'll make the correction on that...

----------

My bad, the file that I linked was actually for CS6 PR and NOT for AE. But it's STILL a great file to have for test rendering...

Lastly, we should really do everything on stock speeds for a more comparable test. BTW, I do have another GTX 580 3GB Hydro GPU, but the way I have everything setup (as you can see from the pic), there is NO WAY I can take out these cards without a LOT OF WORK.

I've seen the pic and would agree.

I'm holding off getting more GTX 580s until this is completed and Rampage responds to my RFI.

You are aware that we can improve our GPUs' CUDA performances on the SR-2 by increasing the PCI-E parameters: i) Under Frequency/Voltage Control you can change the following to:
PCIE Frequency Setting - 101-105 (if you go much higher your system will begin to act unstable just like trying to overclock a Sandy Bridge Xeon);
ii) Under Signal Tweaks you can increase the following:
PCIE Signals 1 and 2 which should usually be set to "Auto."

Please tell me the specific name of that file to run so I'll be sure to run the right one.

PunkNugget · Mar 19, 2013

Tutor said:
I've seen the pic and would agree.

I'm holding off getting more GTX 580s until this is completed and Rampage responds to my RFI.

You are aware that we can improve our GPUs' CUDA performances on the SR-2 by increasing the PCI-E parameters: i) Under Frequency/Voltage Control you can change the following to:
PCIE Frequency Setting - 101-105 (if you go much higher your system will begin to act unstable just like trying to overclock a Sandy Bridge Xeon);
ii) Under Signal Tweaks you can increase the following:
PCIE Signals 1 and 2 which should usually be set to "Auto."

Please tell me the specific name of that file to run so I'll be sure to run the right one.

That's great to know. I really don't want to heat up my office any more than I have to. By the way here's that CS6 11.1 RAYTRACE BENCHMARK file just in case anyone needs it. What I'll do it figure out a way to duplicate this file into a lengthier one as I will the link that I provided you for that CS6 PR file.

BTW, that 800+MB file name for PR when you unzip it is called:

PPBM6.prproj

If you have any difficulty opening it as I did (because I had to locate one missing file that was on there but I had to select to get it open), just let me know and I'll help. Plus, that multi-layered file is only like 52 sec. long. What we can do is copy/paste/copy/paste until we get it to 60 min. long, then we can do that test. It's a terrible looking file (you'll know what I'm talking about when you view it) but great for testing in CS6 PR.

PS - I thought you mentioned that you had more than enough GTX 580's? Are they in another machine and can't be used until your done with your other project?

Tesselator · Mar 19, 2013

Awesomeness! Thanks bro!

DJenkins · Mar 20, 2013

Hi Punknugget, be careful if doing a copy/paste within the timeline. I have noticed (...but not thoroughly tested) that if the new Disk Cache features are enabled in AE it will locate duplicate frames and speed up the render time.

For a truly accurate test add in some fractal noise or something which will be totally random on every frame. Or better still disable disk cache altogether to remove any possibility of artificial decreases in render time.

------
Edit: sorry just noticed I think you are focusing on Premiere Pro, my statement above may not apply, I haven't tested my findings in PP.

All We Know About Maximizing CPU Related Performance

macrumors 65816

Attachments

macrumors 6502

macrumors regular

Attachments

macrumors 65816

macrumors regular

macrumors 601

macrumors 65816

macrumors regular

macrumors 601

macrumors 6502

macrumors 6502

macrumors 65816

Attachments

macrumors 65816

Attachments

macrumors 65816

macrumors member

macrumors 65816

macrumors regular

macrumors 65816

macrumors regular

macrumors 65816

macrumors regular

macrumors 65816

macrumors regular

Attachments

macrumors 601

macrumors 6502

Our Staff