Thunderbolt GPU's: technical barriers

Discussion in 'Mac Pro' started by Cubemmal, Jan 2, 2014.

  1. Cubemmal macrumors 6502a

    Joined:
    Jun 13, 2013
    #1
    Let's discuss the technical issues with Thunderbolt GPU support.

    The most promising external GPU work is at Silverstone, who recently demonstrated a work in progress

    [​IMG]

    MSI also has something they're working on, but inexplicably they didn't include power.

    Does this case make sense?

    The Anandtech Mac Pro review states that PCIe 3.0 x16 channels is 15.7 GB/s bandwidth. Going further in the review he shows TB 2.0 as having 2.5 GB/s of bandwidth - quite a difference. However, consider that with TB 1.0 half of the bandwidth can be used for video output, namely 1.25 GB/s. That's the theoretical max, but we can assume that displaying pixels on a 27" 2560x1440 screen at 30 Hz or 60 Hz takes no more than 1.25 GB/s.

    So the question is, does that really require 15.7 GB/s on the input? What are all those PCIe x16 lanes for anyhow? Loading the VRAM (RAM->VRAM DMA), serving interrupts (CPU->GPU), and inter-card communication (SLI/Crossfire) AFAIK. Now we can eliminate the final use case as this would be a single card only solution, though theoretically somebody could actually make a two slot external TB2.0->PCIe GPU box with two x16 lanes. Are 16 lanes really needed for loading VRAM and serving interrupts?

    So here my knowledge and searching pans out. My question is whether this analysis is accurate, and whether this kind of device can work?

    The remaining piece for SilverStone is to get Intel certification, and OS X thunderbolt aware drivers.
     
  2. paulrbeers macrumors 68040

    Joined:
    Dec 17, 2009
    #2
    1st off: Most GPUs aren't maxing out the full 16x lanes. In fact in many consumer grade computers, the GPU is most likely on an 8X lane anyway. Now, an 8X PCIE3.0 vs 4x PCIE2.0 is a difference of about 4x. So there still is a data penalty obviously. I'm going to point you to an article Barefeats did awhile back comparing a Radeon 5770 and 5870 running on Various Mac Pros. The Mac Pro 1,1 uses PCIE 1.0 vs 2.0 on all the others, while there was a penalty in overall performance, what is interesting is in how many cases the penalty was minimal. It also isn't a perfect comparison, because the Mac Pro 1,1 has significantly less CPU power so some of the penalty seen may not even have been due to the lack of PCIE bandwidth, but rather a lot of CPU grunt to pass data to the GPU.

    http://www.barefeats.com/wst10g7.html

    Anyway, so my point is, yes there will be a penalty using an external GPU. Most who have done it already (using external GPUs) are doing it because they have Macbook Airs or Mac Mini's with their iGPUs and just about any Discrete GPU (even bandwidth starved GPUs) is worth it.

    Whether it would be worth doing for the Mac Pro especially when you can buy Dual D700's for not a whole lot more than what the External GPU boxes are going for (see $800 for just the box alone for a Sonnet Tech) and you do not have to worry about starving your GPU for data if you go D700's.
     
  3. sirio76 macrumors regular

    Joined:
    Mar 28, 2013
  4. deconstruct60 macrumors 604

    Joined:
    Mar 10, 2009
    #4
    Like that hasn't been exhaustively done before....


    The Anandtech Mac Pro review states that PCIe 3.0 x16 channels is 15.7 GB/s bandwidth. Going further in the review he shows TB 2.0 as having 2.5 GB/s of bandwidth - quite a difference.
    [/quote]

    There is a difference between Thunderbolt bandwidth and transported PCI-e data transported bandwidth. What travels over TB cables is TB data. What eventually gets to the cards is PCI-e data.

    Pragmatically the limit is 3x PCI-e v2.0. It is good that the Mac Pro opens slightly higher to the individual TB controllers. That will help in latency and traffic aggregation issues ( http://en.wikipedia.org/wiki/Fat_tree ), but that isn't what the external GPU card is going to see.





    This is a goofy tangent to the topic. The bandwidth of what comes out of the GPU subsystem is highly decoupled from what goes into it. In the context of an external GPU, the output is external the host and TB network. It could be TB/s. It doesn't make a difference at all to the stuff upstream.






    You keep pointing at hardware where the usage is being done by software. As long as groping around on hardware you aren't going to find an answer.




    Over most of recent history, tnter-card communiction has been outside of PCI-e traffic. So it has little impact. In fact, folks have used that back-channel network to skirt around PCI-e lane starved systems. ( ship smaller chucks of data down two different, smaller paths and then merge post computation results over a proprietary back-channel network. )

    Whether there is loads of data to load into VRAM largely depends upon what the software is doing. There are also several work-arounds that drivers and applications have developed to load compressed data into VRAM and then decompress it on the other side to deal with limited PCI-e bandwidth.

    Actually probably not. What there would more likely be is two x4 electrical slots with 16 physical connector ( of which 3/4 pins are dead. )

    Typically more PCI-e switches dilute same size or greater bandwidth into smaller amounts. A switch where there is some local x16 crossbar and a relative miniscule upstream connection of just x4 would be strange. In TB PCI-e enclusures you are going to find a switch that splits x4 into more x4's (2 or 3 ) or into smaller bundles ( 4-6 1x ). If both x16 connections start pumping data in parallel through a x4 choke point it is doubtful could put in a buffer large enough to keep up that was economical even if could get around the quite substantive delays that would incur. That is exactly BACKWARDs of how to design a network that deals well with congestion and latency.

    that enclosure isn't only going to do GPUs. So getting the TB certification has nothing to do with external GPUs working or not.

    Likewise the cards are what need the drivers not the enclosure. Again you are coupling issues that aren't coupled.

    ----------

     
  5. Cubemmal thread starter macrumors 6502a

    Joined:
    Jun 13, 2013
    #5
    Having a bad day?

    Obviously

    my point was to look at bandwidths and compare them.

    ...

    OK, having a hard time seeing what you're trying to say here. Thanks for contributing though.
     
  6. ytoyoda macrumors member

    Joined:
    Dec 14, 2013
    Location:
    Tokyo
    #6
    Recently I read this interesting article. Can OpenGL And OpenCL Overhaul Your Photo Editing Experience?
    This is the very long article, so if you want to read the point, please look at the end of 8. Q&A Under Hood of Adobe to the middle of 9. Q&A Under Hood of Adobe

    Russell Williams from Adobe stated clear that for GPU Computation, PCIe bus which connect GPU and CPU is the bottleneck, for copying data back and forth. It means, even PICe v3 x16 bus becomes bottle neck.

    I think, external GPU does not suit GPU computation.
     
  7. goMac macrumors 604

    Joined:
    Apr 15, 2004
    #7
    Like a lot of things, it depends.

    Games us very little PCI-E bandwidth. They basically stream over as much as they can when the level loads and cache it in VRAM, and then during the game very little bandwidth is used.

    In Adobe's case, the bandwidth usage is very high because they have to send over 30-ish frames a second, not even counting whatever else is part of the composition. Because of a lot of content in every frame is different, the possibilities for caching are a lot less. So for apps like Adobe's apps, bandwidth is a lot more important.

    So if you want Thunderbolt GPUs for games, it seems pretty workable. If you want it for video editing, it really depends on how much data you're throwing around.

    Thunderbolt also supposedly has channel bonding, so there is a chance you might be able to use multiple Thunderbolt ports to add bandwidth.
     
  8. jav6454 macrumors P6

    jav6454

    Joined:
    Nov 14, 2007
    Location:
    1 Geostationary Tower Plaza
    #8
    Yes it requires all 15.7GB/s in the PCIe lanes. Why? Well, images are not crunched in the CPU but the GPUs, so what is being sent is a slew of raw data from the CPU concerning geometric, trigonometric, and other graphical data. This data is very dense and has to have the information about the picture to be displayed (in this case since it's per second, assume a minimum of 60 frames to be processed and displayed). All that data can clog a nice data pipeline like PCIe. However, be sure to know that during PCIe 2.0 days, several GPUs did not saturate just yet the lanes. However, as nVidia starts to move SLI and other data intensive task through the PCIe, I can see 15.7GB/s very helpful to have.

    All x16 lanes carry data. Period. You have a return and sending lane (- and +) for each PCIe x1 lane.

    A Thunderbolt GPU will be nice to have but won't get you the best quality.
     

Share This Page