GTX Titan So where do we stand...?

Discussion in 'Mac Pro' started by registudio, Aug 20, 2013.

  1. registudio macrumors member

    registudio

    Joined:
    Aug 8, 2013
    Location:
    Montreal, QC
    #1
    Hi guys,


    Sorry again to re-open such topics, but as far I read here and there, for instance this very interesting discussion (http://forums.macrumors.com/showthread.php?t=1565650&page=5) and so on so forth currently present in the the forum, I have to say I'm quiet confused, therefore my question would be where do we stand for now?

    Some of users who installed the Titans with (or whitout) external PSU on Mac OS X do really get the benefit of the full power of the cards without any restrictions? Hmmm not sure (boot screen issue etc...).

    In my case, I have to purchase a Cubix expander (plugged on the slot 2 of the Mac Pro like the test made in Barefeats.com) and put 4 high-end graphic cards such Titans if and only if it's getting the full power unleashed under Mac OS X.
    My use will mainly be for Octane render which I just ordered couple of days ago.

    Anyone have a though on it?
    :)

    Cheers
    Regis
     
  2. flowrider macrumors 601

    flowrider

    Joined:
    Nov 23, 2012
    #2
    MacVidCards has got the GTX770 flashed and working, but not the Titan, at least NOT YET!

    Lou
     
  3. registudio thread starter macrumors member

    registudio

    Joined:
    Aug 8, 2013
    Location:
    Montreal, QC
    #3
    Yeah I know I've just purchased 1 to put in slot 1 ;o)
     
  4. Tutor, Aug 22, 2013
    Last edited: Aug 23, 2013

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #4
    To get the full power of CUDA in a Titan, you'll have to run it in Windows to activate all of the double precision floating point peak performance power for Octane render [see, e.g., post #564 here: http://forums.macrumors.com/showthread.php?t=1333421&page=23 ]. Double precision peak floating point performance is capable of being increased by over 600% and single precision peak floating point performance is capable of being increased by over 40% with Nvidia Control Panel and Precision X in Windows. There is no software utility to do so in OSX, even ignoring the other issues.

    Make sure that the Cubix has adequate power for the number of Titans (and other cards) that you intend to install - each Titan card needs at least 250 watts and at least 325 watts when fully unleashed. My Tyan server (AlphaCanisLupus0) has 8 Titans for Octane render, utilizing three 1000 watt power supplies. AlphaCanisLupus0 has over 16 TFLOPS of double precision peak floating point performance and 50.1 TFLOPS of single precision peak floating point performance (vs. the next top of the line Mac Pro, which is advertised to have about 7 TFLOPS of single precision peak floating point performance).

    BTW: I got my barebones Tyan server for about $3600 from here: http://www.superbiiz.com/detail.php?name=TS-B715V2R . N.B. - 2400W [(2+1) 2400W @200-240V], Max. 12Vdc@ 199.6A / 3000W [3 x1000W @100-127V], Max. 12Vdc@ 249.6A; Note: Only one AC inlet allowed per circuit breaker per http://www.tyan.com/Barebones_FT72B7015_B7015F72V2R .
     

    Attached Files:

  5. ticotoo macrumors member

    Joined:
    Jul 8, 2010
    #5
    Use for Folding

    Tutor,
    It would be interesting to see what kind of numbers you would get from AlphaCanisLupus0 when running the Folding client. Unfortunately v7 GPU support only works in windows & its in beta for Linux. You would really boost TeamOSX's numbers.
    :):):)
     
  6. flowrider macrumors 601

    flowrider

    Joined:
    Nov 23, 2012
  7. flowrider macrumors 601

    flowrider

    Joined:
    Nov 23, 2012
    #7
    This article may be of interest to readers of this thread:

    http://www.barefeats.com/promax1.html

    MacVidCards and ProMax loand BareFeats some equipment and the results proved to be wild, IMO:eek:

    Lou
     
  8. registudio thread starter macrumors member

    registudio

    Joined:
    Aug 8, 2013
    Location:
    Montreal, QC
    #8
    Sorry super late reply from me...;o(
    @ Tutor thank you very much for your input.
    @ Flowrider thank you.

    I ordered the Cubix (1500W) and it's on its way.
    With regards to the GTX Titans, I'm in touch with Rob (Barefeats.com) and Macvidcards to see how do I have to proceed without using Windows (even if I'm not activating double precision "overspeed"), either modified/flashed one or original...
    Octane render should recognize them in both case (I think, not sure yet).
     
  9. Tutor, Sep 20, 2013
    Last edited: Sep 20, 2013

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #9
  10. registudio thread starter macrumors member

    registudio

    Joined:
    Aug 8, 2013
    Location:
    Montreal, QC
    #10
  11. MacVidCards Suspended

    Joined:
    Nov 17, 2008
    Location:
    Hollywood, CA
    #11
    Tutor,

    Quick question.

    Is there an OSX app or benchmark that benefits greatly from Double Precision? I know that CUDA-Z can show this but I am curious if Octane Render or anything else can gain power from having this enabled.

    Will shortly have some info on this.
     
  12. ashurao macrumors member

    ashurao

    Joined:
    May 13, 2012
    Location:
    Paris, France
    #12
    Hello,
    I'm not sure, but Octane render using single precision. This is what I read all over the Octane's forum.
    But since I do not speak English very well, I could be wrong.
    Have you ever compared the performance of the Titan, with and without the double precision?
     
  13. Tutor, Nov 9, 2013
    Last edited: Nov 9, 2013

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #13
    The only OSX benchmark that I'm aware of that benefits greatly from Double Precision (DP) is OctaneRender V1.0 and earlier. Those early versions were more DP oriented and relied more on DP - that was when the folks at Otoy recommended Fermi over Kepler cards for OctaneRender. Otoy did, however, read the writing on the wall and made later versions more Single Precision (SP) oriented, so that with the current version (1.2/demo and actual application) Keplers, which are SP biased, K***A**. I have found that increasing GPU memory speed with Precision X has a much greater impact on rendering speed than increasing core speed. Also, for these two reasons - more SP biased and higher memory speed, I suspect that the Titanator (aka the GTX 780 Ti) will easily surpass the Titan in OctaneRender performance.

    I haven't tried to see whether there's a difference in other content creation applications such as Premiere and After Effects running under OSX because I now use the Windows versions of most visual content creation applications and have always kept my DP setting wide open with Nvidia's Control Panel and boosted core speed (more recently by just a little, if any) and memory speed (more recently by a larger amount) on all of my GTX cards with EVGA's Precision X.

    Because of the close relationship between the GTX lineup and their Tesla mates, I suggest that you might be able to find more easily what you're looking for by considering the following - it's the areas in which Nvidia says you'd need a K10 (high SP) vs a K20 (high DP):
    1) where K20 shines - Seismic processing, CFD, CAE, Financial computing, Computational chemistry and Physics, Data analytics, Satellite imaging, Weather modeling - See more at: http://www.nvidia.com/object/tesla-servers.html#sthash.JoDNZUjf.dpuf .
    2) where K10 shines - Seismic processing, signal and image processing, video analytics - See more at: http://www.nvidia.com/object/tesla-servers.html#sthash.JoDNZUjf.dpuf . I'm sure that you'll note the slight overlap. Here too, I suspect that a K10 would be preferable for use in Octanerender to a K20 for the reasons stated above.

    Pushing/painting pixels doesn't require the same degree of accuracy as setting coordinates for docking with the space station or tracking a solar flare or meteor for impact analysis. Where pinpoint accuracy is required, DP shines brightest.
     
  14. Tutor, Nov 9, 2013
    Last edited: Nov 9, 2013

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #14
    I don't believe that it's an all or nothing matter, but I do find that with the latest version of OctaneRender that higher SP is generally faster, and when complemented by higher memory speed - is much faster. See also my posts #s 863, 866 and 868 here: http://forums.macrumors.com/showthread.php?t=1333421&page=35 . Think of the difference as DP vs SP biasing. No Nvidia CUDA card completely lacks either DP or SP functions and I cannot turn DP functionality completely off and even if I could turn that functionality completely off, I don't think that I like what I'd get. Also:

    a regular */GTX Titan has 2688 CUDA cores clocked at 875 MHz,
    a regular GTX 780 has 2304 CUDA cores that clocked at 863 MHz,
    a regular GTX 680 has 1536 CUDA cores clocked at 1006 MHz, and
    a regular GTX 580 has only 512 CUDA cores (but) clocked at 1544 MHz.

    [ http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units ]. Barefeats has tested the performance of a number of CUDA cards in performing the OctaneRender Benchmark Scene [ http://www.barefeats.com/gputitan.html ]. Among the cards that he tested that are mentioned above here's how they landed:
    Titan (then the fastest - 95 sec.),
    GTX 780 (then 2nd fastest - 110 sec.),
    GTX 580C (C=Classified - it's clocked higher than a regular GTX 580, but they each have the same number of cores; note that he did not test a regular GTX 580) (then 3rd fastest - 160 sec.) ,
    a GTX 680C (overclocked - then fifth fastest - 176 sec.) and
    a GTX 680 (regular - then sixth fastest - 189 sec.).

    Prior to November 7, 2013 but after the GTX Titan's arrival, the Titan was the gold standard for CUDA capable cards in OctaneRender. I have coined the term "TE" which stands for Titan Equivalency [ see post # 865 here: http://forums.macrumors.com/showthread.php?t=1333421&page=35 ]. Basically, it's how fast a CUDA GPU(s) renders that OctaneRender Benchmark Scene in relation to a Titan. A regular Titan has a TE = 1 [95/95 = 1]; a regular GTX 780 earned a TE = .86 [ 95/110 = .86 ], a regular GTX 580C earned a TE = .59 [ 95/160 = .59 ], a regular GTX 680C earned a TE = .54 [ 95/176 = .54 ] and a regular GTX 680 earned a TE = .50 [ 95/189 = .50 ]. This was useful to me in evaluating the performance that I can expect from my CUDA rigs because OctaneRender scales perfectly linear for GPU additions. For example, if you have two regular GTX 680s in the same system and allocate both of them to rendering that OctaneRender Benchmark Scene, you'll get the performance that I get with one regular Titan [.5x2 = 1]. So if you use four regular 680s, you'll get the performance of two regular Titans and if you have two regular GTX 780s in the same system and allocate both of them to rendering that OctaneRender Benchmark Scene, you'll get the performance that I get with 1.72 regular Titans [ .86x2 = 1.72 - I know it's not good to cut up your GTXs ].

    */ Regular = a reference design w/o any user applied clock tweaking.

    Sorry for this little diversion, but back to your point, I suggest that core speed matters (but Nvidia faces the same type of constraints as does Intel - more cores require more power, generate more heat, and necessitated more cooling), memory speed matters, the effectiveness of core and memory cooling matters, the number of cores and amount of memory matters, and the number/ratio of DP and SP functions matter. But the balance of things do change and even their current state may change.

    Here's a part of what I posted on March 13, 2013 here [ post # 512 http://forums.macrumors.com/showthread.php?t=1333421&page=21 ]:

    Which cards to buy?
    The way Octane has been coded to handle CUDA is exemplified by the following statement from their from one of their FAQ's from their website: "If you are interested in purchasing a new graphics card to use with Octane Render, the Geforce GTX570 or GTX 580 currently have the best Performance to Price ratio. The latest generation of Nvidia GPUs (Kepler) is supported, but currently works slower than their Fermi equivalents. We are still optimizing the performance of Octane on the Kepler GPUs. The GeForce line is higher clocked and renders faster than Quadro and Tesla GPUs, but the latter GPUs often have more memory. A powerful multi-core CPU is not required as Octane does not use the CPU for rendering, but a faster CPU will improve the scene voxelizing speed." Moreover, Octane's manual states:
    OctaneRender™ runs best on Fermi (e.g. GTX 480, GTX 580, GTX 590) and Kepler (e.g. GTX 680, GTX 690) GPUs, but also supports older CUDA enabled GPU models. GeForce cards are fast and cost effective, but have less VRAM than Quadro and Tesla cards. OctaneRender scales perfectly in a multi GPU configuration and can use different types of Nvidia cards at once e.g. a GeForce GTX 260 combined with a Quadro 6000. The official list of NVIDIA CUDA enabled products is located at https://developer.nvidia.com/object/cuda-gpus.
    {Emphasis added.}

    Then later on May 10, 2013, I posted here [ post # 629 http://forums.macrumors.com/showthread.php?t=1333421&page=26 ]:

    It now seems that Otoy has rewritten Octane renderer to take better advantage of the higher single precision floating point peak performance of the Kepler cards, for here is what the user manual now says: "We recommend to use GPUs based on the Kepler architecture as these cards have more memory and consume less power than Fermi GPUs, but are just as fast with OctaneRender™." (Compare to post # 512, above.) So the scramble to find top-end Fermi cards for Octane should now end. So those who own GTX 600 series cards and want to use Octane can now rejoice.

    So that's why I suggest that you consider that it's the biasing of the software to better utilized SP core functions; DP core functions still matter - just not nearly as much as they used to, unless you're rendering with older versions of the software. But we may see the balances shift as the software matures and other functions get added, changed or shelved.

    Finally, don't forget the Titanator, aka the GTX 780 Ti, that will wear many robes affecting its power display [ See, e.g., posts #s nos. 863, 866, 868 here: http://forums.macrumors.com/showthread.php?t=1333421&page=35 ]. That looks to be the best card for now for OctaneRender.
     
  15. Gav Mack macrumors 68020

    Gav Mack

    Joined:
    Jun 15, 2008
    Location:
    Sagittarius A*
    #15
    Speechless, I can only think of one word. Wow :D
     
  16. Tutor, Nov 9, 2013
    Last edited: Nov 13, 2013

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #16
    Also, can you bump up the memory speed by about 125 to 200 MHz for a Titan that you're customizing for a Mac? That would really improve performance in OctaneRender. That's why I can't wait to get my hands on some GTX 780 Ti's, for its memory clock is over 7,000 MHz vs. 6,008 MHz for the Titan. A true matchup would be the one between an EVGA GTX Titan SuperClock, the EVGA GeForce GTX 780 Ti Superclocked w/ EVGA ACX Cooler and the The iChill GeForce GTX 780 Ti HerculeZ X3 Ultra. The X3 Ultra's memory is clock at 7,200 MHz out of the box.
     

Share This Page