Tried a Titan Black Vid Card....

Discussion in 'Mac Pro' started by Stairway, Jul 25, 2014.

  1. Stairway macrumors newbie

    Joined:
    Aug 6, 2010
    #1
    I've been using a Radeon R9 280X, but I figured I'd give a Titan Black a try.
    Using the Nvidia web drivers, everything is running fine in 10.9.4.

    I tried running a couple of benchmarks, and so far I get worse numbers than my Radeon got. Luxmark (Sala): 2400 vs 1800. Cinebench OpenGL: 64 vs 53.

    So I'm now wondering if I am missing something ---- exactly where would the Titan Black excel? I know the extra VRAM is great in some applications, but what is there beyond that?

    I'm not looking to put down the Titan, but I want to know where it would be better for me (while I still have a return window on the Titan)....
     
  2. simsaladimbamba

    Joined:
    Nov 28, 2010
    Location:
    located
    #2
    LuxMark is an OpenCL benchmark tool, the Nvidia GPUs are aimed at CUDA applications, while the AMD GPUs are aimed at OpenCL applications.
     
  3. benjobe2513 macrumors member

    benjobe2513

    Joined:
    Sep 10, 2008
    Location:
    Humboldt County, California
    #3
    Do you have access to Final Cut Pro X or Adobe Premiere Pro? Supposedly the Adobe stuff works better with CUDA while the FCPX works better with OpenCL, but I'd be curious to see what the differences were.

    If not those programs, you might try using Handbrake to encode something to see any difference between the cards.
     
  4. ActionableMango, Jul 25, 2014
    Last edited: Jul 25, 2014

    ActionableMango macrumors 604

    ActionableMango

    Joined:
    Sep 21, 2010
    #4
    I could be wrong, but last I checked into this, Handbrake on OS X has no GPU acceleration whatsoever and is completely CPU bound.

    I think Cinebench is a great predictor of how well a system will render in Cinema4D, but not a great benchmark for anything else. I'd prefer the Heaven and Valley benchmarks for general OpenGL testing.

    But really the best test for you specifically is to try out whatever actual software applications you were trying to speed up in the first place.
     
  5. Stairway thread starter macrumors newbie

    Joined:
    Aug 6, 2010
    #5
    I guess it is just that considering the price difference between the cards, I was expecting a significant "all-around" superiority with the Titan.

    Still, I have no complaints -- the card is smaller than my 280X, is whisper quiet, and the 6 gigs of VRAM could be very useful in certain applications.
     
  6. goMac macrumors 603

    Joined:
    Apr 15, 2004
    #6
    CUDA doesn't typically run better than OpenCL on NVidia cards. The 700 series of NVidia cards just plain sucks for compute period. Regardless of OpenCL or CUDA.

    In a way, CUDA is part of the problem. You don't know that the 780 or Titan is a horrible performer for compute tasks if you have a CUDA app, because you can't benchmark that same app on AMD hardware.
     
  7. MacVidCards Suspended

    Joined:
    Nov 17, 2008
    Location:
    Hollywood, CA
    #7
    To be fair, there is a HUGE divide between GK104 (GTX680/770) and GK110 (GTX780 & Titan)

    Nvidia made Kepler stellar in games, it is a pity that they dropped their superiority in compute to do so.

    http://www.barefeats.com/gpuss.html
     
  8. goMac macrumors 603

    Joined:
    Apr 15, 2004
    #8
    I think the proprietary nature of CUDA made them complacent. What are Nvidia users going to do? Take their CUDA apps to AMD? One advantage of CUDA is that Nvidia is still able to wield quite a bit of monopoly-ish power.
     
  9. Tutor, Jul 26, 2014
    Last edited: Jul 26, 2014

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #9

    Cinebench, as packaged, doesn't measure what the Titan (Black or original) does best and that is CUDA rendering. Currently, there is no CPU that can match a GPU in rendering because that task relies mainly on parallelism. I've estimated that the Titan Black is 1.34x faster than the original Titan [see footnote below]. I have, e.g., tied the Cinebench scene file into my CUDA chain and rendered it in Cinema 4d using Octane and other CUDA based renderers such as TheaRender.*/ I don't doubt that AMD cards have superior compute capability. I have a Hackie system with three PowerColor Radeon HD 5970s [each card has single precision FPP performance of 4,640 GFLOPS (13,920 GFLOPS total) and double precision FPP performance of 928 GFLOPS (2,784 GFLOPS total)] and at OpenCL tasks it's very fast. The rub is that there aren't that many content creation applications that I use that can tap OpenCL for rendering. On the other hand, there are many more content creation apps that I use that take advantage of CUDA [See, e.g., http://www.nvidia.com/content/PDF/gpu-accelerated-applications.pdf ]. That's why I have over 65 CUDA cards.


    */ The CUDArized Cinebench 15 performance of one of my 2007 MacPro2,1s is shown in the pict below. Note the CUDA card that I used has modest CUDA performance (I had in my system, at that time, 4x GT640s overclocked about 20%). Yet, the GPU Cinebench scores are close to the CPU scores of current high end Mac Pros. According to Nvidia, the original Titan [actually they used the Tesla K20 in the statement, but the original Titan renders faster than the K20 because of the Titan's greater clock speeds] has the compute performance of up to 10 computer systems each with a single E5-2687W v1 [ http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf ]. Here's how CUDA cards that I've own and have tested stack up in relation to the original Titan -

    GPU Performance Review

    I. My CUDA GPUs’ Titan Equivalency*/ (TE) from highest to lowest (fastest OctaneRender**/ Benchmark V1.20 score to lowest):

    1) EVGA GTX 780 Ti Superclock (SC) ACX / 3gig (G) = TE of 1.319 (The current Titan Black should perform a little better since it has slightly higher base, boost and memory speeds, but unfortunately for me I don’t own one, so I can’t test it);
    2) EVGA GTX 690 / 4G = TE of 1.202;
    3) EVGA GTX Titan SC / 6G = TE of 1.185;
    4) EVGA GTX 590C = TE of 1.13;

    The original Reference Design (oRD) Titan that Bare Feats tested = TE of 1.0 (95 secs);

    5) EVGA GTX 480 SC / 1.5G = TE of .613;
    6) EVGA GTX 580 Classified (C) / 3G = TE of .594; and
    7) Galaxy 680 / 4G = TE of .593.

    Estimating TE Differential of GTX 780 Ti vs. GTX Titan Black

    I. Factors considered are:

    1) Pixel (GP/s) differential */: 42.7 (GTX Titan Black) / 42 (GTX 780 Ti) = 1.01666666666667

    2) (a) Core count, (b) memory speed (MHz) and (c) bandwidth are the same on both GPUs

    3) Base clock rate differential: 889 (GTX Titan Black) / 875 (GTX 780 Ti) = 1.016


    II. TE of EVGA GTX 780 Ti Superclock (SC) ACX / 3gig (G) = 1.319

    1.319 (780 Ti TE) * 1.016 (Titan Black important differentials) = 1.340104

    95 (time in secs taken by oRD Titan to render 1.20 Benchmark scene) / 1.340104 = 70.89002047602276 secs (estimate of time in secs one Titan Black will take to render 1.20 Benchmark scene); therefore,
    two Titan Blacks should render the scene in ~ 35 to 36 secs. Also, one Titan Black would be expected to have a TE of 1.34 (1.340104 rounded) and two Titan Blacks would be expected to have a TE of 2.68 (1.34 * 2 = 2.68). Accordingly, it appears that it would take 2.68 of the original reference design Titans to yield the same rendering performance of two Titan Blacks. **/



    */ Info source: http://en.wikipedia.org/wiki/List_of_Nv ... sing_units

    **/ In Windows, one can use EGVA Precision X to affect how a GTX GPU performs Octane renders. The most important parameter is the memory speed, secondly is the core and thirdly is the boost clock. Don’t mess with the voltage until you’re a tightrope walker. Remember that when tweaking, less is more. So tweak relevant values in baby steps and test render the Octane benchmark scene at least five times (noting the render time changes and especially system stability) between each stride (subsequent change in memory or core or boost clock) and always keep those temps as low as possible. If your GPU(s) already run(s) hot, then use Precision X to underclock these same parameters. But remember, you’re always tweaking solely at your own peril.
     

    Attached Files:

  10. Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #10
    The GK110s do render much faster than GK104s. Also, all of my testing confirms that it takes 3 Kepler cores [GT(X)700s & 600s] to equal 1 Fermi core [GT(X)500s & 400s] in rendering. The only disadvantages of the Fermi cards (other than their meager OpenCL performance) are that they tend to require more power, run hotter, and have less memory than the Kepler cards. But as is indicated in my last post, a GTX 590 renders faster than the original Titan (and now a used GTX 590 costs 1/3 of the original price of that Titan).
    .
     
  11. Tutor, Jul 26, 2014
    Last edited: Jul 26, 2014

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #11
    As a business owner, I have to be governed by applications' here and now.

    I don't believe its complacency. I believe that the high level driving force is Nvidia's belief that there're more gamers than content creators. Nvidia is spot on there. Cash is the ultimate/low level driving force.

    Keep using Nvidia cards for games, content creation and other CUDA assisted uses until OpenCL developers juice up the apps that those users employ so that AMD has the extant applied advantage. OpenCL's problem is that the OpenCL developers haven't been prolific producers. Even in the case of LuxRender, there now appears to be incipient malaise. Computing history has numerous examples of superior technology that died, never fully achieving its full potential because of no, untimely or sloppy implementation.

    So true. For example, $3k for the Titan Z - the ultimate example of such power.
     
  12. fuchsdh macrumors 6502a

    Joined:
    Jun 19, 2014
    #12
    Well, I think the Titan Z is more a prestige card rather than something most people are going to buy.

    (That said we've got a comp Titan Z sitting in my office, just waiting for a computer to slot it into. I'll certainly enjoy benchmarking it when I can.)
     
  13. Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #13
    Congratulations on your having a Titan Z.

    It probably is a "prestige card," but I've never let prestige or the lack of prestige affect my purchase decisions. For what I do, two Titan Blacks would perform just as well and cost $1k less. All of the systems that I now build (for rendering) have at least four or eight double wide PCIe slots. Since the $3k Titan Z is a three single PCIe slot solution, I think it tends to have limited appeal, except to someone who (1) has a lot of space (and exhaust opening(s)) below the last PCI-e slot of their motherboard (so that the Z blocks the least amount of PCIe slots by sitting as the bottom most card), and/or (2) has a single slot PCI-e card(s) that they absolutely need to accommodate [the Titan Z consumes 2 PCIe x16 slots on most motherboards, but {assuming there is an adjoining x8/x4 slot} leave open the x8/x4 slot usually paired with the adjacent x16 slot], and /or (3) has a PCI-e slot limited system at the point of acquisition and (4) has a lot of money.


    Let me know how it benchmarks. If you use the Octane render benchmark, it'll see the Titan Z as two cards. So if you don't mind, let me know how it performs using one and both GPUs. To that extent, it's like my GTX 590s and 690s. Also, that's why I think there's some price inflation. Nvidia didn't charge triple the retail price of the GPUs which the 590s and 690s contain like they have with the Titan Z.

    That said, if you were to give me your Titan Z, I wouldn't hesitate to accept it.
     
  14. sirio76, Jul 28, 2014
    Last edited: Jul 28, 2014

    sirio76 macrumors regular

    Joined:
    Mar 28, 2013
    #14
    Is this a joke?
    How can it be possible that:
    -a computer that tipically score around 500 reach almost 1400points in multithreaded test?
    -a 7 years old 8core runs faster than the fastest 2014 8core Xeon?
    -a 7 years old computer get much faster single thread performance than the fastest 2014 overclocked I7?
    -the GPU results is so low considering that in Cinebench OpenGL test almost half of the performance comes from the single threated performance of the CPU?
    -(if those numbers are true)this thing can run all day computing long renders without explode?
    Honestly the only reasonable result seems to be the one from the GPU.
     
  15. MacVidCards, Jul 28, 2014
    Last edited: Jul 28, 2014

    MacVidCards Suspended

    Joined:
    Nov 17, 2008
    Location:
    Hollywood, CA
    #15
    Cinebench is a joke

    Always has been, at least GPU part

    i have seen 8800GT score same as GTX580

    i sent word and the clowns don't care

    completely 100% inaccurate...all based on CPU
     
  16. sirio76 macrumors regular

    Joined:
    Mar 28, 2013
    #16
    This does not answer my questions.
    I know CB OpenGL test depends also on CPU speed but anyway it will give you much better indication than gaming benchmark on how a GPU will perform inside C4D.
     
  17. imashination macrumors newbie

    Joined:
    Jul 28, 2014
    #17
    @Tutor
    What do you mean with "The CUDArized Cinebench 15 performance of one of my 2007 MacPro2,1s is shown in the pict below. Note the CUDA card that I used has modest CUDA performance"

    Cinebench doesnt use cuda in any way, shape or form. It is not possible for that computer you list to achieve the score you show.

    Cinebench is a benchmark to show how well your system will perform with c4d, it has never been billed as anything other than that. The cpu scores are accurate and useful for comparing chips for any task. The opengl test though, only indicates what performance you can expect whilst editing in c4d. Given that virtually every task you do in pretty much any 3d app must first be calculated by the cpu and then the results of that sent to the gfx card for display, the single threaded speed of a machine will play very heavily into the score.

    This means it isnt a great test for looking at how fast a gfx card is, but is a good test to show how a system performs overall. 3D graphics display can often be hampered by a poor low power cpu sending it data slowly, the test will show this. The only place you will see pure opengl scores is in a benchmark where everything that happens is pre-baked and processed almost solely on the gfx card. ie. games or opencl/cuda code running isolated on the card.
     
  18. Tutor, Jul 28, 2014
    Last edited: Jul 28, 2014

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #18
    This is just another example of experiments free time can make me think of doing, just as with my unfinished 32 core OS X Geekbench experiment [ http://browser.primatelabs.com/geekbench3/compare/558552?baseline=300738 ]. So my answer to your first question is, "Yes and no." See, below.

    By my getting my GPUs experimentally to do primarily what the benchmark expects the CPUs to be doing, the results shown were produced. That's what I mean by the term "CUDArized." There're still a few kinks that I haven't been able to work out. For example, I haven't figured out yet how to get the GPU core numbers to display correctly. So, I used a brute force allocation formula for the single core result to match 1/8 [1,536 cores / 8 = 192 cores] of the score for all GPU cores provided by the four GT640 cards [384 cores per card x 4 = 1,536 cores total]; so 192 GPU cores were used to produce the single core result. I used eighths because that system is an eight core w/o hyper threading and is seen as such by Cinebench (for the time being).

    I didn't expect much OpenGL performance from a ~$100 GT 640 and true to my expectation it didn't produce much.

    The core numbers aren't accurate [see, above]. It'll certainly run CUDA for weeks on end without exploding, using the 4 GT 640s that I used for this experiment, since it usually runs 2x GTX 590s and one GT 640 for rendering for days on end and displays many multiples of rendering prowess in it's usual configuration.

    Admittedly, this experiment is just a work in progress because I would expect for this system, as usually configured and working correctly with Octane, to produce a Cinebench score of at least 15-20 times that of a pristine MacPro2,1 without any CUDA assist.

    See, above.

    It's completely true that Cinebench wasn't designed to use CUDA in any way, shape or form. But, it's also true (1) that Cinema 4d wasn't designed in any way, shape or form to use CUDA and (2) that Cinema4d and Cinebench are about as close of a match between a benchmark and an application as one could ever conceive. I use CUDA via Octane render with Cinema 4d. Octane render was designed to fit into the Cinema 4d workflow (first solely as an external renderer and later as an integrated plug-in). My pipe dreams are that both Cinebench and Cinema 4d will be modified officially to use CUDA directly, without Octane render. GPUs excel over CPUs in parallel tasks such as rendering. Moreover, users are beginning to either use or explore GPU rendering. Just call me a dreamer who when he awakens tries somewhat clumsily to replicate his dreams.

    I agree that were it not for my tinkering to cause Octane to assist in rendering that benchmark, that a 2007 MacPro2,1 couldn't achieve the scores shown.

    I have no reason to disagree with you on that score.
     
  19. imashination macrumors newbie

    Joined:
    Jul 28, 2014
    #19
    Ok, just so we're all clear, you've essentially loaded up the cinebench rendering project in c4d, changed the render engine over to octane and resaved the project file back into the cinebench folder, so when cinebench runs, it is in fact triggering a different render engine.

    Whilst that as a benchmark it can have a use, too much has changed for the numbers to really mean anything by themselves. At the very least you would need to run the standard CB render on the CPU once to gain a 'real' score which can be compared with other results (eg. we only know 150+ is good because we have seen other systems scoring an average of 80)

    Then you would need to run the octane render once on the cpu to get a base system score to compare with the regular render engine. Otherwise too many variables will have changed, you don't know whether to attribute the speed increase to the gfx cards or the fact that its a different render engine.

    Then finally run the octane test on the GPUs, only with all three scores can you have any meaningful results.

    ie. a render score of 188 could be absolutely terrible for octane, we don't know as there's nothing else to compare it against.
     
  20. sirio76 macrumors regular

    Joined:
    Mar 28, 2013
    #20
    Sorry but I really don't understand what you are talking about, how the internal Cinebench(C4D) CPU render engine it's supposed to report score from a GPU?

    Well, (for now) this is true only with a very small percentage of 3D works, normally CPU renderer are as fast or much faster in many more tasks and without any of the limitation of GPU renderer. That's why you don't see any 3D movie rendered in Octane(or any other similar product) :)
     
  21. Tutor, Jul 29, 2014
    Last edited: Jul 29, 2014

    Tutor macrumors 65816

    Tutor

    Joined:
    Jun 25, 2009
    Location:
    Home of the Birmingham Civil Rights Institute
    #21
    That’s, in essence, what I was shooting to accomplish.

    I may try your suggestion, for it seems promising, when I next have adequate free time.

    Sorry, but it appears to me that you really do understand the problem, but, like me, you probably do not now know how to get the conversion to work accurately.

    I like caramel icing on my cake. To me, it’s the best tasting icing of them all. Caramel is just a small percentage of the kinds of icing that can go on a cake and, admittedly, some don’t like caramel icing on their cake; some don’t like any icing on their cake; some may like cake and/or caramel icing on it, but can’t consume it for medical reasons; and there’re some who don’t like cake at all. So while it’s true that GPU rendering, like caramel cake, isn’t for everyone, it’s also true that 3d rendering isn’t for everyone. The OP asked “So I'm now wondering if I am missing something ---- exactly where would the Titan Black excel? I know the extra VRAM is great in some applications, but what is there beyond that? … I’m not looking to put down the Titan, but I want to know where it would be better for me (while I still have a return window on the Titan)….” I now realize that I should have asked first is: "What is/are his/her interest(s)/application(s)?" But I began by stating, in sum, that the Titan Black excels above all other CUDA GPUs {I'd forgotten about the pricey Titan Z} at 3d rendering because, for that, I use my GPUs. And I added, in sum, (1) although AMD cards’ OpenCL prowess is greater at compute than Nvidia's OpenCL implementation, OpenCL doesn’t currently have many content creation applications and (2) a resource listing other applications for which CUDA aids [ http://www.nvidia.com/content/PDF/gpu-accelerated-applications.pdf ].

    There are likely many reasons why we haven’t yet seen any box office 3d movies fully rendered in Octane (or any other similar product), such as (1) risk avoidance, (2) GPU rendering is relatively new technology, (3) prior investment in CPU rendering technology might have been heavy, (4) familiarity, etc. This is especially relevant if one considers how much money 3d movie studios have invested in CPU based rendering technology [where one soon needs another system to increase (or maintain, for heavier and heavier scenes) render speeds] and the fact that now Intel changes (as of late) CPU socket requirements every couple (or three) of years. Thus, there’re many reasons generally (and others, specifically) why one’s latest and favorite 3d box office movie wasn’t made in final form by GPU rendering (but we may never know if/where/how it played a role). A significant limitation of GPU rendering, as it relates to the all 3d movies, has been, because GPU rendering is relatively new, a few of the features offered by CPU rendering might just now (recently or soon) be implemented on a particular GPU rendering engine, such as hair rendering only being recently added to Octane, but stylized polygonal hair has not been completely abandoned in all genre. As it relates to box office 3d movies generally, because GPU rendering began by placing total reliance on the Vram amount on the GPU card, that has been a limitation for GPU rendering in that industry generally. However, few of us are currently rendering for box office films. And as with almost any limitation, there are ways we can get around it/them for the vast majority of the projects that we’ve had or we currently get. Additionally, just a few of the approaches that appear to hold great promise for GPU rendering (and applicable to all industries) for overcoming the limited Vram issue, besides GPUs now being released with larger and larger Vram amounts, include, but are not limited to, the following:

    1) now found in Redshift3d v1.0 (“ … out-of-core design allows it to efficiently render very large scenes with geometry and textures that far exceed available VRAM.” [ See https://www.redshift3d.com ];

    2) now found in Cinema4d’s Team Rendering [ (permitting bucket rendering of blocks in a single frame by multiple computers {I’ve found that it even works with Octane v1 & v2 Plugins} - See http://www.maxon.net/en/products/new-in-cinema-4d-r15/rendering.html & http://www.maxon.net/fileadmin/maxon/products/r15_new/rendering/faq/teamrender_faq_en.pdf ]; and

    3) soon to added to Octane v2 Standalone [Region and Network rendering - See http://render.otoy.com/features.php ].

    That is not to say that GPU rendering doesn’t have any other issues to overcome. But if you look closely at the current state of GPU rendering by comparing the specific features offered by each engine, you’ll find that the issues are infrequently monolithic - what one GPU rendering engine may lack completely, another one may currently excel. So a lot depends on the specific 3d software that you use and the specific GPU rendering engine that you choose. Octane currently covers more 3d applications than any other GPU rendering solution [ http://render.otoy.com/index.php ] (and will soon aid certain After Effects renders). But depending on what particular 3d application you choose to use, there may be another better GPU rendering solution for your particular application, that hasn't been mentioned above, and that includes, but is not limited to, Chaos V-Ray RT, cebas finalRender and Furry Ball. The Titan Black renders faster than all other CUDA cards, excluding the pricey Titan Z, but if you currently have a particular need for Vram supporting error correction on the fastest CUDA card, then get the Tesla K40.
     

Share This Page