Thunderbolt bandwidth potential

wiz329 · Jun 6, 2013

goMac said:
Actually, that's a good example. The RAM riser boards and CPU daughter cards are all risky components. They work because there is basically only one connector. A stackable machine would have three or four connectors. It's not really doable.

This has been mentioned in other threads. Some machines are getting rid of CPU daughter cards and sockets due to high performance issues with them.

One socket is usually ok. The more you add, the more risky it gets. A machine that uses three or four connectors between components? Probably not do-able.

I'm in a little above my head here, and I definitely don't pretend to know the ins and the outs of computer engineering.

What type of connection is required to pair up multiple CPUs? For example, on the current Mac Pro, what is required to have a dual CPU configuration? What makes them "talk" with each other and decide how to divide the load (forgive the layman speak).

Or, for supercomputers (or perhaps even servers -- I don't know how they work that well), or really any computer that's built on a scalable design, how are all of the separate motherboards/CPUs/RAM configurations connected together?

ScottishCaptain · Jun 6, 2013

wiz329 said:
I'm in a little above my head here, and I definitely don't pretend to know the ins and the outs of computer engineering.

What type of connection is required to pair up multiple CPUs? For example, on the current Mac Pro, what is required to have a dual CPU configuration? What makes them "talk" with each other and decide how to divide the load (forgive the layman speak).

Or, for supercomputers (or perhaps even servers -- I don't know how they work that well), or really any computer that's built on a scalable design, how are all of the separate motherboards/CPUs/RAM configurations connected together?

Processors inside a Mac Pro are linked using something called QPI:

http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect

Scheduling ("dividing the load") is handled by software. The CPUs don't really do that, the software does. The CPU just does whatever you tell it to do, they don't have a private conversation amongst themselves and determine who should run what. That is the job of the operating system kernel.

Supercomputers are a completely different ball park. Once again, they heavily rely on software to distribute a series of tasks across a large number of nodes. Supercomputers do not operate like "one big computer". They operate like a whole bunch of isolated nodes that can send messages to each other really, really fast. Your software has to figure out how to send those messages and what to do with them, and that means building your software to specifically run on the platform that supercomputer provides.

The only exception to this is a few super high-end and rare architectures, like SGI's Numa interlink present on the Origin and Altix rack mounted systems. Those computers could effectively take N different nodes and make them appear as a single computer, booting a single operating system image. Your software still had to be massively parallel in order to take full advantage of those kinds of resources, as it's one thing to have 4 CPUs inside your desktop and a completely different thing to have 512 CPUs.

-SC

symber · Jun 7, 2013

iMac Pro?

What interests me is the possibility of buying the modular Mac Pro GPU and plugging it into my iMac via Thunderbolt.

The graphics card in my iMac is the only piece of hardware I want to upgrade, and if this Mac Pro starts bringing TB GPUs to the market, I'm all for it.

What do you think? Feasible?

robbieduncan · Jun 7, 2013

symber said:
What interests me is the possibility of buying the modular Mac Pro GPU and plugging it into my iMac via Thunderbolt.

The graphics card in my iMac is the only piece of hardware I want to upgrade, and if this Mac Pro starts bringing TB GPUs to the market, I'm all for it.

What do you think? Feasible?

If you mean "and then plug an external monitor into the output of the GPU" then yes, I believe you can buy a TB box with a slot for a GPU and do that now. If you mean "the external GPU will drive the iMac screen" then no, no chance.

symber · Jun 7, 2013

robbieduncan said:
If you mean "and then plug an external monitor into the output of the GPU" then yes, I believe you can buy a TB box with a slot for a GPU and do that now. If you mean "the external GPU will drive the iMac screen" then no, no chance.

Bum.

wiz329 · Jun 7, 2013

symber said:
Bum.

http://www.tonymacx86.com/214-external-thunderbolt-graphics-cards-next-big-thing-gamers.html

According to this article, Lucid's product (which was still under development at the time of the article, so things could have changed) is able to port the signal back into the laptop's display.

deconstruct60 · Jun 7, 2013

wiz329 said:
According to this article, Lucid's product (which was still under development at the time of the article, so things could have changed) is able to port the signal back into the laptop's display.

Lucid's product only works under Windows. It is highly specialized Windows graphics driver. It won't fly on OS X ( or Linux if I recall correctly).

I'm a bit skeptical this is really going to be make gamers happy as screen sizes grow. Portions of the frame buffer are copies out of the external GPU's VRAM and back into the host computer's RAM and then sent to the screen. Frankly it is kind of goofy to go through all of those gyrations if the external GPU card has a monitor socket on its edge. Connect the monitor directly to that and run all of the graphics on the external GPU and ta-da less overhead and higher frame rates. If the gamer is a frame-rate junikie that is the route they will go.

The buffer-copy-merge-and-then-ship-out has less overhead when the GPU is stilling on a full x16 or x8 PCI-e v3.0 link to the iGPU. It will be 2-4 times as fast a transfer.

It is certainly a nice hack. Most PC motherboard vendors shipping Thunderbolt enabled boards are shipping with this virtual GPU driver bundled. Not sure if that is more "because it is a cool hack" or "users find very high value in it".

deconstruct60 · Jun 7, 2013

ScottishCaptain said:
... Let's assume that it was designed properly (realistically) and you had a 200 pin edge connector on each PCB that mated with the module to the top and bottom, plus another 20 pins for power. That's 220 individual connections that need to be perfect.

Add four modules and you've got 3*220 = 660 individual points that need to be electrically operable, otherwise your system will crash or fail to boot. Why the hell do I want that?

The diagram has the modules mated by Thunderbolt. Side stepping a bit for the moment that the socket isn't going to pass TB certification as imaged, Thunderbolt only has 4 wires. The count is no where near that high. Likewise the Thunderbolt requires transcievers on both sides of the connection. It is a lot more expensive, but if boost signals along the way they go farther. It happens every day with international message connectivity.

Yes a passive, snap-together backplane is very very very bad for modern high speed internal PC interconnect. What the design here does though is use an external transport mechanism to go very short distances. That's very feasible. Economically it is whacked because all that "long distance" infrastructure overhead has to be paid for, but the signals will actually move without a lot of loss or noise.

The disconnected from reality aspect ( of which there are many in the drawing) is that connectors imaged don't really account for the required transceiver infrastructure that TB requires. [ Frankly the other TB socket placements at the edges of the box are also whacked but is only related to the same root cause dissconnect from TB placement requirements. ]

But by using TB (or a hacked customization of TB ... good luck getting that past Intel though. They aren't approving or supplying TB controllers for hacked variants of the their standard. ) the number of wires goes way down. The scrafice is that bandwidth is substantially choked off. The more modules add the more congested the data path will get.

goMac said:
This has been mentioned in other threads. Some machines are getting rid of CPU daughter cards and sockets due to high performance issues with them.

Yeah, PCI-e v3.0 ( and if there is a PCI-e v4.0 in mainstream PCs ) is pushing toward where Apple's Mac Pro daughthercard set up may not work so hot. Would move to just straight socket and just one "edge turn" to the physical PCI-e socket. However, that would suggest that they do need to make a shift to selling different sized boxes since one single physical logic board isn't going to supply all Mac Pros (presuming want to keep selling both E5 1600 and 2600 solutions). So something like

|P| -- PCI-e slot
[C] -- CPU socket
[R] -- RAM DIMMs
-- I/O Hub
[G] -- Embedded GPU

Single socket

|P| |P| |P|
|P| |P| |P| [C] [G]
--- ---- |P| [R]

Dual socket variant

|P| |P| |P| |P|
|P| |P| |P| |P| [C] [C] [G]
|P| --- ---- |P| [R] [R]

Essentially, it is the first board with some more attached to both sides to support additional CPU and another x16 slot. One is going to be shorter than the other. That means the first box doesn't have to be as tall (since vertically mounted ) as the second.

But looping back to Thunderbolt. It absolutely does not reasonably and economically address doing the differentiation between those two boards. Huge difference between fabricating two boards that share the a large overlap in requirements for very high PCI-e throughput and the kneedcapping tradeoff by trying to chop this into even something pieces and "glue it" back together with Thunderbolt. TB sucks at that.

goMac · Jun 7, 2013

deconstruct60 said:
Lucid's product only works under Windows. It is highly specialized Windows graphics driver. It won't fly on OS X ( or Linux if I recall correctly).

Eh. I could see Apple going this route if they have support for PCIe cards.

The OpenGL stack in OS X already allows you to share resources between two GPUs. So implementing this under OS X would be really simple. Just have one card draw to a texture, and the other card draws that texture out to display. Apple already has sample code that does that in a roughly "we're going to hack SLI together with software" app. One card draws a bunch of stuff, saves it to a texture, and then the other card draws the rest of the scene. Apple demoed it with a 4870 and a 285 GTX running in tandem IIRC.

I don't think main RAM is hit at any time. I think the resource sharing is directly between cards, but I could be wrong. But it should be that slow because that's basically what the window server is already doing all the time. (Correction, I suppose if one card is integrated on the CPU, you'd have to hit main RAM.)

What Apple would need to write is all the OS magic to do this automatically. But at least on the driver side it's all there.

deconstruct60 · Jun 7, 2013

goMac said:
Eh. I could see Apple going this route if they have support for PCIe cards.

It is relatively easy when both cards are hard plugged in the same box. There is a bigger jump though if the cards can disappear at random (hot plugging).

For an external TB solution this whole stack has to dynamically change on the fly.

For hard-plugged cards sure it may be "mostly" there, but I'd be willing to be there are some fundamental assumptions built to partial solution that do not take hot-plugging into account.

I don't think main RAM is hit at any time. I think the resource sharing is directly between cards, but I could be wrong.

The card-to-card direct DMA would likely be driver work, hence it wouldn't just be Apple in the loop. I think the lucid solution does a pull/push with the CPU (or they do custom low level themselves). RAM may not looped in if there is a both read+write indirect but it is definitely in the "multiple chefs in the kitchen" zone.

ElderBrE · Jun 7, 2013

wiz329 said:
http://www.tonymacx86.com/214-external-thunderbolt-graphics-cards-next-big-thing-gamers.html

According to this article, Lucid's product (which was still under development at the time of the article, so things could have changed) is able to port the signal back into the laptop's display.

Check this out

http://www.anandtech.com/show/7040/...tone?utm_medium=referral&utm_source=pulsenews

deconstruct60 · Jun 7, 2013

ElderBrE said:
Check this out

http://www.anandtech.com/show/7040/...tone?utm_medium=referral&utm_source=pulsenews

Likely will roll to market in a fashion similar to the Belkin docking station. "Coming real soon now" that turns into over 12-18 months because the driver issues and unknowns that pop up along the way.

Like one of the comments on the article points out it seems to be a solution in search of a problem. There seems to be more action on the Windows PC side of pumping out DIY mainboards for towers that have TB connectors on them than a deep uptake by large system vendors to put TB on a wide breadth of their laptop portfolios.

As long as a large fraction in the Win PC market are these desktop users this is pretty lame as those boxes have PCI-e slots. The motherboard makers dutifily mutated Intel's reference design, but not sure where that is going in this specific context.

For the grossly underpowered GPU laptop yeah there is a "solution" here. But where is that market? One of the big pushes in 2013 is going to be rolling out HD Iris Pro / HD Iris ( HD5200 / HD5100 ) and AMD solutions equally boosting GPU speeds. $200-400 extra spent on one of those is faster all of the time; including when mobile.

This is extremely likely why the Thunderbolt certification test pragmatically require that the GPU providing the DisplayPort input be embedded on the motherboard. If relatively permanently seated then will not generation any wink-in/wink-out of the signals involved. That is on the "other side" of the controller from these boxes but it does make things simpler on both sides.

goMac · Jun 7, 2013

deconstruct60 said:
It is relatively easy when both cards are hard plugged in the same box. There is a bigger jump though if the cards can disappear at random (hot plugging).

Yeah. I was more thinking about a Mac Pro with PCIe cards + integrated graphics to drive the Thunderbolt display (which is one thing Lucid is used for.)

It might be workable for an external GPU too, but like you said, pulling the card would be an issue. All that texture data disappearing wouldn't go over very well.

You might be able to write all that data to RAM in parallel. It would be a giant PITA to have to have 2 gigs of memory to mirror your card, but you could at least not bog down the card's performance with RAM writes. Writing that software stack would not be fun though.

I don't think Apple is going to have an external card solution in the near future. I can see that being a problem they'd love to solve for Macbook Pros though.

ElderBrE · Jun 7, 2013

goMac said:
Yeah. I was more thinking about a Mac Pro with PCIe cards + integrated graphics to drive the Thunderbolt display (which is one thing Lucid is used for.)

It might be workable for an external GPU too, but like you said, pulling the card would be an issue. All that texture data disappearing wouldn't go over very well.

You might be able to write all that data to RAM in parallel. It would be a giant PITA to have to have 2 gigs of memory to mirror your card, but you could at least not bog down the card's performance with RAM writes. Writing that software stack would not be fun though.

I don't think Apple is going to have an external card solution in the near future. I can see that being a problem they'd love to solve for Macbook Pros though.

Yep. Be mobile with a laptop that can be carried easily, get to the office, plug the cables in and be connected to your 10Gbit ethernet, heavy duty GPU and massive storage. I can see that happening sooner than a modular Mac Pro, but still a while away.

Tesselator · Jun 8, 2013

ScottishCaptain said:
Have you seen the CPU connector on the daughter card? There's over a hundred contact points on that sucker, plus several large ones for high current DC power. A single PCI-e 16x slot has over 82 conductors.

That "prototype" I keep seeing is so horribly thought out I don't even know where to begin with it. I can't stand industrial designers who throw out garbage like that and forget about all the physical implications of an idea just because it's not convenient for the design. Seriously, two connectors for both a high-speed data bus and high current power supply? He doesn't even address the latching system that would be required to solidify removable modules into a stable monolithic configuration. All it shows is a bunch of tiny latches that give you the impression things are supposed to hook together, without actually detailing how such a system would operate- the mechanics behind that kind of thing are not trivial to get right.

Apple excels at hardware design precisely because they know what their ideas entail as a whole while they're designing them. The Mac Pro's case latch doubles up and secures the ODD and disk drives as well as holding on the side panel. A design decision like that requires foresight and planning. I see none of that in the prototype posted above.

All true. Ya, the posted concept art is just a "simplified illustration" or "artist concept" rendering. I think he doesn't have the modeling chops to detail it further - or doesn't want to invest the time.

The more connectors you add to a system, the more unstable it becomes. You've gone from a solid configuration of a Mac Pro tower to a whole bunch of stacked modules. Let's assume that it was designed properly (realistically) and you had a 200 pin edge connector on each PCB that mated with the module to the top and bottom, plus another 20 pins for power. That's 220 individual connections that need to be perfect.

Naw, the only power that needs to transfer over is 120VAC (if USA for example). There would be a PSU in each module built to spec - in my vision of such a product. We need to keep things sane and increase the advantages.

Add four modules and you've got 3*220 = 660 individual points that need to be electrically operable, otherwise your system will crash or fail to boot. Why the hell do I want that? People seem to forget that systems like the SNES and N64 used edge connectors for their game cartridges, and occasionally needed to be reseated because the system wouldn't startup properly. Do you really want to have to dismantle your modular tower on a monthly or weekly basis because something shifted a bit (vibrations from a DC fan or hard disk drive), causing one of those 660 connections to become intermittent?

-SC

Again, that depends what the module is. PCIe needs one set, SATA needs one set, and power from the mains needs another. And that's about it. Look, these already exist. Both PCIe expansion housings and external drive enclosures. We're not adding anything new here really just applying/implying a design concept change.

Keep another thing in mind as well. If they did do something like this it would vastly increase the Mac Market I think. These same expansion units could be used with iMacs and MacMini's with just a single alternate subprocess in the production line. Right, how many folks would opt for some CUDA cores and GPU upgradability for their MINI or iMacs?

I'm not trying to say they will or won't go modular but it does make a whole bunch of sense when you think about actual designs, line tooling, and marketing. The only disadvantages are what we're seeing here already - people not grasping or understanding the concept. Which is no wonder... as we're all just speculating in general without any details and of course no announcements.

Tesselator · Jun 8, 2013

goMac said:
Actually, that's a good example. The RAM riser boards and CPU daughter cards are all risky components. They work because there is basically only one connector. A stackable machine would have three or four connectors. It's not really doable.

No, still just one connection for the PCIe box and the SATA connections are not unstable across even 3 or 4 extensions (but only one extension is needed). Make it so that if there's a PCIe expansion unit added (for more than the internal two already present) it has to be the one physically attached to the main unit and wala, only one connection for 4, 6, or 8 more PCIe slots. That one layer would contain the one SATA extension unless you were adding more than one SATA expansion layer.

Again, as I mentioned above, all (100%) of the connection layout and functionality already exist. So you can't really say it's not doable when it's already been done and has proven itself. It's just a change in the design concept - only. And one which helps to ensure that Apple products are selected and used much more often than what's available via 3rd party vendor.

This has been mentioned in other threads. Some machines are getting rid of CPU daughter cards and sockets due to high performance issues with them.

One socket is usually ok. The more you add, the more risky it gets. A machine that uses three or four connectors between components? Probably not do-able.

Right, of course. That's to do with the internal design of the main unit and nothing to do with the expansion units being talked about. I brought that up as an example showing that Apple already likes modularity and has more complexity in their designs than most other Workstation vendors - nothing more than that.

deconstruct60 · Jun 9, 2013

Tesselator said:
Again, as I mentioned above, all (100%) of the connection layout and functionality already exist.

The disconnect is that this 100% functionality is not reflected in the diagram. Those are not Thunderbolt connectors (although labeled TB ).
That is why there is drift here. It is something else / something new and that 'new' thing is being looked at as different by different people.

As stated before, there is a litany of things that are mechanically and electrically wrong (completely disconnected with real design constraints for the parts involved) with the design as imaged. That is in part why it produces follow on conversation like this one.

slughead · Jun 9, 2013

Another problem perhaps not mentioned yet is that there's latency with conversion between one type of signal and another. I tried explaining this in another thread. Even though TBolt has all this possible bandwidth, it's still plagued by the fact that there are no TBolt native hard drives (or GPU for that matter). PCIe, SATA, etc. would have to be converted to TBolt and then back again. Therefore, there's going to be some latency just in that conversion. I also can see goMac's point about connectors just multiplying the chance for error and failure.

Personally, I don't see what's so difficult about swapping out a processor once every 4 years, or a GPU every 2. Yes, it could be easier to swap the middle pancake out of a stack of flapjacks, but it's not that hard... Definitely not worth the larger volume of the case and expense, even if they do resolve the other issues mentioned in the thread.

As far as even having external Tbolt drives as a replacement for internal SATA ports, it doesn't even compare. SAS is even superior, even when it's converting SATA to SAS.

Thunderbolt RAID 0 with 6Gb/s SSDs appears to run into a bottleneck when you compare it to the SAS RAID 0 with the same 6Gb/s SSDs. I guess the 1000+MB/s theoretical bandwidth is... theoretical.

http://www.barefeats.com/tbolt01.html (test performed with a $1,000 Pegasus Thunderbolt enclosure)

TBolt may be better for storage than FW and USB, but it's nowhere near the level of SAS or (e)SATA.

Tesselator · Jun 9, 2013

deconstruct60 said:
The disconnect is that this 100% functionality is not reflected in the diagram. Those are not Thunderbolt connectors (although labeled TB ).
That is why there is drift here. It is something else / something new and that 'new' thing is being looked at as different by different people.

As stated before, there is a litany of things that are mechanically and electrically wrong (completely disconnected with real design constraints for the parts involved) with the design as imaged. That is in part why it produces follow on conversation like this one.

Yup, that's how people are using those almost block-diagram-ish illustrations. If we say that is exactly what it will look like then of course the illustrations are erred. If we say however that it is indeed modeled just as a conceptual block-diagram and not meant to be anything more - which is what the artist himself claims, then it all fits as we fill in the gaps with known actual connectors when we redraw it in our minds. I'm seeing some folks having trouble producing this mental drawing but I can see not only one but multiple ways in which it could work and work well.

You apply some magical undisclosed "real design constraints" in order to prohibit yourself from realizing the design. This is the exact opposite of good engineering. A good engineer is a problem solver and not a wimp who becomes frightened when a challenge is presented. Interestingly the ones you're claiming to be so challenging have already been solved. An example of one such solution to a similar engineering challenge resulted in Apple's own slide-in fan assembly cage used to cool the CPUs and expel hot air from the system unit. Other examples you might be able to relate to if you were to stop and think about it are the very SATA connections and sled system already in use in all SATA equipped MacPro systems not to mention pretty much any hop swap coupling designs out there.

It's easy-peezy pie and cake to design an interlocking system with all the male and female fitted/guided socket components required for data and power. Why is it so easy? Because all the components and engineering required to do it already exist and are in wide usage today. I can created it myself with off-the-shelf parts and bailing wire for goodness sakes. Replace the bailing wire with some plastic molding for use as stabilizers and guides and you'll have an Apple-like solution which imbues idiot-proof consumer-ready construction with all of the precision needed to obtain superior connection integrity.

The fact that you can't solve this very simple engineering problem in just minutes of thinking tells me only that there is a very high probability you won't be employed by Apple as design engineer should they decide to make such a system. I mean if you can't even imagine what already exists how could possibly add the few minor tweaks needed to make it production and consumer ready? Think about it.

slughead said:
Another problem ...

Yeah, the solution of course is not to use thunderbolt - at least not in the way being proposed above.

ThisIsNotMe · Jun 9, 2013

The (flawed) assumption is that "modular" means no PCIe16x3 slots.
Also if I can net render over GbE there is no reason I can't throw tasks to external co-processors over Thunderbolt.

crjackson2134 · Jun 9, 2013

KBS756 said:
What keeps apple from developing their own interface between these modular sections, or using something like the connector between the processor board and the motherboard in the current Mac Pro to connect modular parts?

I dont see why if you do go modular it would have to be thunderbolt

Agreed, this is what I would think makes more sense if the entire footprint is to be diminished and still retain expandability & upgradeability.

wiz329 · Jun 10, 2013

On another note, can you have a workstation with more than 2 CPUs? If so, what would such a configuration look like?

Erasmus · Jun 10, 2013

Just thought I would add, if you really need lots of Mac Pros to do your number crunching, you should be optimising your code for GPGPU. Not just because GPUs are really fast, but it forces you to optimise the amount of data you are sending between processing units, vs the amount of processing that is performed on that data.

If the processing time vs data transfer ratio is high enough, using the thunderbolt connection to link two or more Mac Pros should provide easily enough bandwidth for cluster computing. It just comes down to how you code it.

Thunderbolt bandwidth potential

macrumors 6502a

macrumors 6502a

macrumors member

Moderator emeritus

macrumors member

macrumors 6502a

macrumors G5

macrumors G5

macrumors 604

macrumors G5

macrumors regular

macrumors G5

macrumors 604

macrumors regular

macrumors 601

macrumors 601

macrumors G5

macrumors 68040

macrumors 601

Suspended

macrumors 601

macrumors 6502a

macrumors 68030

Our Staff