CES 2012: MSI Shows Off Thunderbolt-Connected External GPU

repoman27 · Jan 11, 2012

pgiguere1 said:
I didn't know that. If I'm understanding correctly, all hardware used will be the iMac's (CPU, GPU, RAM, ports, superdrive, isight, speakers), but it will boot from the MBA's SSD and recognize the iMac's HDD as a secondary drive?

And everything (video, sound, data) will go through the thunderbolt cable? That's pretty awesome.

How fast would data transfer from the MBA's SSD be considering that part of the Thunderbolt bandwidth would be used just for the monitor part?

Thanks.

Stingray454 said:
I had no idea this worked either - that's pretty neat. Does it only work over TB, or is there a simple way to boot from a MBA even with the older iMacs (ethernet cable maybe)?

Anyway, about your questions - if this works, you're running everything on the iMac and just booting from the MBA's harddrive. Everything is running on the iMac hardware, so only the files you read are transferred (no sound, monitor image or similar needs to be sent). Since the speed of the MBA SSD is about 250Mb/s and thunderbolt easily handles many times that speed, it shouldn't be a problem.

It will work for any two Macs with FireWire or any two Macs with Thunderbolt as long as the version of Mac OS on the target disk you're using will run on the Mac you're trying to boot it from (and PPC Macs can't boot from GPT volumes, only APM volumes will work.) MBAs lack FireWire, so only the Thunderbolt versions can pull this off. Although, with Lion Server or a little trickery you could also probably set up an MBA as a NetBoot server and boot from it over WiFi.

As Stingray454 points out, you're just using the boot volume of the target machine, so you're only limited by the bandwidth of your connection to that. FireWire works fairly predictably, however initial benchmarks of TB were not really any better than FW. Apple released an EFI update to address this, but I haven't seen any benchmarks with the patch installed, and haven't had the opportunity to test it myself.

One thing I kinda glossed over is that Mac OS X stores hardware specific cache files to speed up boot times and facilitate restoring network connections. If you boot from a Mac in Target Disk Mode with any regularity, you'd also want to set up a script that allows you to swap between two or more sets of hardware cache files to save from having to rebuild them every time you boot from a different machine.

Nightarchaon · Jan 11, 2012

I like the idea that I can add an nvidia cheap card To my iMac purely for physx and Cuda under bootcamp , happy with the 6770 for the games I play but miss physx in batman arkham city

Stingray454 · Jan 11, 2012

repoman27 said:
One thing I kinda glossed over is that Mac OS X stores hardware specific cache files to speed up boot times and facilitate restoring network connections. If you boot from a Mac in Target Disk Mode with any regularity, you'd also want to set up a script that allows you to swap between two or more sets of hardware cache files to save from having to rebuild them every time you boot from a different machine.

Ah, you have a point there. Not that a longer boot time is really a big problem, but if you were able to automatically swap between the Air's and iMacs cache files that would be awesome (or even better, if Apple could consider this at boot-time in a future OS update). Do you have any scripts / programs that solves this at the moment? If so, I'd be interested.

Your post may end up costing me a lot of money, since I'm now considering this to be an excellent solution, probably better than a separate GPU-in-a-box with TB display.

toke lahti · Jan 11, 2012

Hellhammer said:
Thunderbolt = 20Gbit/s

Am I mistaken, when I have undestood that current TB chip implementations max out 10+10 Gb/s (dp+data), so that you can't use it as 15+5 or 5+15?

Why else all ads are talking about 10Gb/s?

Maybe specs allow either dp or data to use the whole 20Gb/s same way as specs allow fw to be 3.2Gb/s?

Btw, since TB is 2-ways, wouldn't it be possible to route eGPU's signal back to internal screen?

deconstruct60 · Jan 11, 2012

toke lahti said:
Am I mistaken, when I have undestood that current TB chip implementations max out 10+10 Gb/s (dp+data), so that you can't use it as 15+5 or 5+15?

Why else all ads are talking about 10Gb/s?

Btw, since TB is 2-ways, wouldn't it be possible to route eGPU's signal back to internal screen?

Click to expand...

TB doesn't handle the full length of the display port chain. For example

[TB controller ] <------> [ DP Switch ] <------- [internal GPU ]
[ internal screen DP port ] <------|

versus

[TB cntlr ] <------<channel 1> [ internal GPU ] <channel 2> ------> [ internal screen DP port]

The GPU is not a DP switch. The iMac can play at being a Monitor because they have added a switch to the chain that can redirect signals in some cases. In contrast, a MBP doesn't have it and the TB output is the 2nd monitor output (or a mirror ).

Hellhammer · Jan 11, 2012

toke lahti said:
Am I mistaken, when I have undestood that current TB chip implementations max out 10+10 Gb/s (dp+data), so that you can't use it as 15+5 or 5+15?

Why else all ads are talking about 10Gb/s?

Maybe specs allow either dp or data to use the whole 20Gb/s same way as specs allow fw to be 3.2Gb/s?

After some more digging, I might be wrong. I remember Anand saying something what I wrote earlier but my memory might not be serving me right.

The Pegasus benchmarks suggest that TB is 10Gb/s+10Gb/s as six SSDs in RAID 0 are only achieving 1GB/s. Those SSDs are good for +500MB/s as standalone versions so we should see over 1GB/s if TB was good for more.

It looks like TB is really 10Gb/s for PCIe and 10Gb/s for DP, no matter what you're using. My bad.

toke lahti · Jan 11, 2012

Hellhammer said:
It looks like TB is really 10Gb/s for PCIe and 10Gb/s for DP, no matter what you're using. My bad.

Well, Intel hasn't made understanding TB very easy, maybe because they are trying to make money out of it. I guess that to get more first-hand knowledge about TB, you'll have to sign NDA.

There might be hope that Intel gets its act together and next gen of TB chips can allocate bandwith more freely and maybe will even have ability to extract 2 dp signals out of it. Then you could really daisy chain 2 monitors and no need to have arbitrary TB box between them.

DarwinOSX · Jan 11, 2012

SockRolid said:
Thunderbolt could be the reason why Apple hasn't updated the Mac Pro in a year and a half. Soon there won't be any need for a big all-in-one tower. For ultra-high-performance, you'll be able to build a cluster of Mac minis, with an external GPU and storage. All connected with Thunderbolt.

Just a crazy theory, but it could happen. Especially after optical Thunderbolt is available.

Intel hasn't had chips worth the Pro upgrade until just recently. I would expect a new Pro soon.

AidenShaw · Jan 11, 2012

Hellhammer said:
The Pegasus benchmarks suggest that TB is 10Gb/s+10Gb/s as six SSDs in RAID 0 are only achieving 1GB/s. Those SSDs are good for +500MB/s as standalone versions so we should see over 1GB/s if TB was good for more.

A single Pegasus chassis would have to be limited to 10 Gbps - since TB offers a PCIe 1.0 x4 connection.

(Note: In theory a T-Bolt disk array could have two independent PCIe x4 busses, and two separate PCIe SATA controllers, and have 20 Gbps throughput. It might have to occupy two device positions on the daisy chain, however.)

You'd need to have two Pegasus chassis loaded with SSDs to properly test the theory - if the two chassis get a total of around 1 GBps, then it would appear that T-Bolt reserves one channel for DisplayPort and the other for PCIe.

If two chassis get around 2 GBps (or significantly more than 1 GBps), then clearly T-Bolt is running PCIe on both channels.

repoman27 · Jan 11, 2012

Every Thunderbolt port provides 2 channels of 10Gbps, bi-directional goodness. Each direction in each channel can be used for PCIe and/or DP packets at the same time. Thunderbolt does not care what the underlying protocol of the packets is.

According to Intel, Thunderbolt uses "a highly efficient, low-overhead packet format with flexible QoS support that allows multiplexing of bursty PCI Express transactions with isochronous DisplayPort communication on the same link."

And here's an Intel slide from IDF which clearly reiterates this:

I drew a block diagram to try to illustrate what I believe to be the underlying architecture of a Thunderbolt controller. This is purely based on the evidence at hand, such as Intel's technology brief which states:

A Thunderbolt controller contains:
• A high-performance, cross-bar Thunderbolt protocol switch
• One or more Thunderbolt ports
• One or more DisplayPort protocol adapter ports
• A PCI Express switch with one or more PCI Express protocol adapter ports

Each of the protocol adapters is only capable of a single channel connection to the Thunderbolt protocol switch, and so each is therefore limited to 10Gbps.

This 10Gbps does not include any 8b/10b encoding overhead as is alluded to by Intel's statement that, "The physical layer has been designed to introduce very minimal overhead and provides full 10Gbps of usable bandwidth to the upper layers." This is born out in test results and makes sense because fully supporting a DP 1.1a main link with 8b/10b would require at least 10.8Gbps.

With the bare minimum of PCIe overhead on a Sandy Bridge system, 10Gbps would work out to a theoretical maximum of 1028 MB/s of payload throughput. AnandTech managed to squeeze out 1002.7 MB/s in their first solo test of the Pegasus R6. Further testing with the Pegasus R6 and Apple Thunderbolt Display together showed the PCIe throughput of the Pegasus drop by almost exactly as much as the ATD was using at any given point in time. As far as I know, regardless of the number or type of devices connected, nobody has topped Anand's 1002.7 MB/s PCIe over Thunderbolt throughput record. Until Intel releases Thunderbolt host controllers with more than one PCIe protocol adapter, I'm not sure that anybody will.

That being said, each of the three DisplayPort protocol adapters is theoretically capable of pushing 8.641Gbps in a single direction at any given point in time. Combined with the 10Gbps, full-duplex, PCIe protocol adapter, a Light Ridge TB controller is therefore capable of moving 10Gbps PCIe in / 10Gbps PCIe out + 8.641Gbps DP in / 17.282Gbps DP out. A Thunderbolt cable can carry 20Gbps in each direction at the same time with no regard to underlying protocol, split into two 10Gbps channels. Sort of like having 2 uplinks between a pair of Ethernet switches. Further proof of the lack of a 10Gbps PCIe + 10Gbps DP split lies in the fact that it is possible to daisy chain two ATD's, which requires sending 11.6Gbps of DisplayPort data down a Thunderbolt cable in a single direction, in addition to any potential PCIe traffic.

Hellhammer · Jan 11, 2012

toke lahti said:
Well, Intel hasn't made understanding TB very easy, maybe because they are trying to make money out of it. I guess that to get more first-hand knowledge about TB, you'll have to sign NDA.

There might be hope that Intel gets its act together and next gen of TB chips can allocate bandwith more freely and maybe will even have ability to extract 2 dp signals out of it. Then you could really daisy chain 2 monitors and no need to have arbitrary TB box between them.

That doesn't make any sense, IMO. The display needs 6.75Gb/s of bandwidth, so there is 13.25Gb/s of bandwidth left if the channel can be mixed. 900MB/s is 7.2Gb/s, GigE Ethernet is 1Gb/s, FW800 is 0.8Gb/s, USB 2.0 is 0.48Gb/s. That doesn't even max out the 10Gb/s, yet the Pegasus is bottlenecked. The camera takes some bandwidth but the bottlenecking happens without the camera as well.

Then again, two TB monitors can be daisy-chained so the protocol must allow DP to be carried in both channels simultaneously. Maybe PCIe can as well, or it's just DP. Intel definitely didn't make this too simple.

grahamnp · Jan 12, 2012

GermanyChris said:
You can currently using OSX server

My post was made with regards to another post about connecting multiple mac mini like devices as a computing cluster via TB. I've never heard or read anything that suggests something like this is possible with TB. Could you elaborate or provide a link please?

repoman27 · Jan 12, 2012

toke lahti said:
Well, Intel hasn't made understanding TB very easy, maybe because they are trying to make money out of it. I guess that to get more first-hand knowledge about TB, you'll have to sign NDA.

There might be hope that Intel gets its act together and next gen of TB chips can allocate bandwith more freely and maybe will even have ability to extract 2 dp signals out of it. Then you could really daisy chain 2 monitors and no need to have arbitrary TB box between them.

No, they sure haven't made it easy.

I don't think TB's current limitations have so much to do with allocation of bandwidth as they do with the number of available protocol adapters. Just as I believe the 10Gbps PCIe bandwidth cap has to do with the single available PCIe to Thunderbolt protocol adapter, the inability for a Thunderbolt device to drive two displays simultaneously has to do with the single Thunderbolt to DisplayPort Sink protocol adapter. This can certainly be remedied in future designs, but would call for a far more complex cross-bar switch, and Light Ridge is already using a 4x4 as far as I can tell.

AidenShaw said:
A single Pegasus chassis would have to be limited to 10 Gbps - since TB offers a PCIe 1.0 x4 connection.

(Note: In theory a T-Bolt disk array could have two independent PCIe x4 busses, and two separate PCIe SATA controllers, and have 20 Gbps throughput. It might have to occupy two device positions on the daisy chain, however.)

You'd need to have two Pegasus chassis loaded with SSDs to properly test the theory - if the two chassis get a total of around 1 GBps, then it would appear that T-Bolt reserves one channel for DisplayPort and the other for PCIe.

If two chassis get around 2 GBps (or significantly more than 1 GBps), then clearly T-Bolt is running PCIe on both channels.

Thunderbolt controllers have connections for PCIe 2.0 x4. If they were only PCIe 1.0, throughput would be limited to 1000 MB/s after 8b/10b is removed but before any PCIe packet overhead is accounted for. Anandtech demonstrated throughput peaking above 1000 MB/s and consistently above any realistic level of efficiency if there was only a PCIe 1.0 x4 connection.

Although there are connections to the PCB from a Thunderbolt controller for PCIe 2.0 x4, there is only 1 single-channel (10Gbps) Thunderbolt to PCIe protocol adapter on-die. This limits total PCIe throughput to 1250 MB/s before packet overhead, which falls right in line with the real-world numbers we're observing. This is true of all TB host controllers, including the ones in all of the TB Macs shipped thus far, so the daisy chain or device configuration is irrelevant.

Hellhammer said:
Image

That doesn't make any sense, IMO. The display needs 6.75Gb/s of bandwidth, so there is 13.25Gb/s of bandwidth left if the channel can be mixed. 900MB/s is 7.2Gb/s, GigE Ethernet is 1Gb/s, FW800 is 0.8Gb/s, USB 2.0 is 0.48Gb/s. That doesn't even max out the 10Gb/s, yet the Pegasus is bottlenecked. The camera takes some bandwidth but the bottlenecking happens without the camera as well.

Then again, two TB monitors can be daisy-chained so the protocol must allow DP to be carried in both channels simultaneously. Maybe PCIe can as well, or it's just DP. Intel definitely didn't make this too simple.

A single TB cable can carry 20Gbps in each direction. I calculate the ATD's DP needs (2560x1440, 24bpp, 60Hz, CVT-R) to be 5.8Gbps or 725MB/s. That's all headed outbound on the cable. Anand's tests with the Pegasus and ATD together were sequential read tests from the Pegasus array so the data is almost exclusively coming inbound on the cable. The performance of the Pegasus starts out at 909.4MB/s and only drops off 8.9MB/s when connected to the ATD. This difference is most likely due to the small amount of traffic generated by the PCIe devices in the ATD, or by general switching performance with multiple devices in the chain. If we accept that we only have 10Gbps to play with, we're looking at around 72% PCIe throughput efficiency, which is just about on the money.

The third test adds audio (outbound PCIe traffic) and Gigabit Ethernet (inbound PCIe traffic). The audio is irrelevant, because even 2.1 channels of uncompressed, 24-bit, 192kHz, PCM audio would only amount to 1.728MB/s of traffic, and despite the fact that it's isochronous, it's all going in the opposite direction as the data coming from the Pegasus. The GigE traffic is being read from a file server to a local disk, and looking at the difference between the Pegasus throughput in the second and third tests, I'd say it's coming in at about 45.7MB/s—which would be very typical for reading from a network file server over Gigabit Ethernet.

The final test adds a FW800 to USB 2.0 transfer and FaceTime HD. The FaceTime HD camera is just a USB 2.0 device itself, so really we're limited here to the amount of data that the EHCI in the ATD can push at any given point in time. Just under 40MB/s is pretty typical, and lo and behold, the Pegasus performance has been reduced by another 37.5 MB/s. If Anand had set up a streaming read from the FW800 drive to a local disk, instead of bottlenecking it by copying data to a USB 2.0 drive, I'd expect the Pegasus to drop by closer to 78MB/s.

GermanyChris · Jan 12, 2012

grahamnp said:
My post was made with regards to another post about connecting multiple mac mini like devices as a computing cluster via TB. I've never heard or read anything that suggests something like this is possible with TB. Could you elaborate or provide a link please?

Google Mac Mini Thunderbolt cluster...

You'll find some music studios doing it and a mac format article from a few months back...

mutantteenager · Jan 12, 2012

...or Apple could just use decent graphics cards in their machines in the first place?

Mr. Wonderful · Jan 12, 2012

mutantteenager said:
...or Apple could just use decent graphics cards in their machines in the first place?

In their defense, TSMC screwed over everyone when they failed to create a 32nm process.

Hellhammer · Jan 12, 2012

Mr. Wonderful said:
In their defense, TSMC screwed over everyone when they failed to create a 32nm process.

Doesn't explain how others are able to squeeze in a dGPU in 13" laptops. 6750M is decent given the form factor but IGP in 13" MacBook Pro is a bit ridiculous.

smoledman · Jan 12, 2012

This is definitely for the pro market, the average Joe won't know or care.

repoman27 · Jan 12, 2012

Stingray454 said:
Ah, you have a point there. Not that a longer boot time is really a big problem, but if you were able to automatically swap between the Air's and iMacs cache files that would be awesome (or even better, if Apple could consider this at boot-time in a future OS update). Do you have any scripts / programs that solves this at the moment? If so, I'd be interested.

Your post may end up costing me a lot of money, since I'm now considering this to be an excellent solution, probably better than a separate GPU-in-a-box with TB display.

I usually boot from another Mac in Target Disk Mode only for testing purposes, to see if a problem is reproducible or is specific to a certain hardware setup. In which case I usually clear the hardware cache files from single user mode first.

I started to think about how to script this, but then I considered the bigger picture. If you have an MBA and an iMac in the same room along with a decent WiFi router, it would be trivial to set up file sharing between the two and you could achieve transfer rates of 116.8 MB/s* (at least according to Anandtech's tests). So then I reckoned that the sole benefit of doing things the other way is that you would only have to configure and maintain one system, and would still be able to use all of your applications on both machines. That's where my solution runs into a little problem.

Due to the differences in hardware, most activation based applications are going to force you to reactivate every time you switch machines. Even if this doesn't run afoul of your license agreements, it'd still be super annoying.

A really well designed, sub $500, external Thunderbolt GPU solution would still be king.

* In case anyone actually reads this, I realize that I made a MB vs Mb error here. This should obviously read 116.8 Mbps or 14.6 MB/s.

AidenShaw · Jan 12, 2012

repoman27 said:
As for TB switches, they're quite feasible, although I still think the device limit would stand. Also, from a technological standpoint, for an 8-port TB switch you're looking at the equivalent of a 16-port 10GbE switch. Which, if you care to price them out, are very expensive.

Single chip PCIe switches are common stuff, and are priced in the $30 to $90 range (or less in volume). Many multi-function PCIe cards include on-card PCIe switches.

http://www2.electronicproducts.com/...dscape-article-FAJH_PLX_Nov2008-doc-html.aspx

The analogy with 10 GbE fails, for example because a typical 16 port 10 GbE switch has a cross-section bandwidth of 320 Gbps. The T-Bolt switch would only need 20 Gbps (or so).

I'm also ignoring what one would do about the DisplayPort signals - perhaps drop them, or route them to one output only. I'm still wondering how soon Intel will drop DisplayPort from T-Bolt, it seems like such a mistake for T-Bolt to include it.

Anyway, even a 4 port T-Bolt switch would carry a surprising price tag - it would need the PCIe switch chip and 5 T-Bolt controllers on-board. But nowhere close to a wire-speed 16 port 10 GbE switch.

And as far as the number of devices, I don't think that we know the actual situation. If the device limit is determined by daisy-chain latencies, then a T-Bolt switch-based topology could have more than 6(7) devices.

repoman27 · Jan 12, 2012

AidenShaw said:
Single chip PCIe switches are common stuff, and are priced in the $30 to $90 range (or less in volume). Many multi-function PCIe cards include on-card PCIe switches.

Quite true, and a Thunderbolt switch should be pretty similar to a PCIe switch, but it would be a pretty beefy one, switching the equivalent of 5 lanes of uplink to 30 of downlink. I think the Thunderbolt PHY is probably considerably more expensive to implement than a PCIe card-edge connection though.

AidenShaw said:
The analogy with 10 GbE fails, for example because a typical 16 port 10 GbE switch has a cross-section bandwidth of 320 Gbps. The T-Bolt switch would only need 20 Gbps (or so).

I'm also ignoring what one would do about the DisplayPort signals - perhaps drop them, or route them to one output only. I'm still wondering how soon Intel will drop DisplayPort from T-Bolt, it seems like such a mistake for T-Bolt to include it.

Anyway, even a 4 port T-Bolt switch would carry a surprising price tag - it would need the PCIe switch chip and 5 T-Bolt controllers on-board. But nowhere close to a wire-speed 16 port 10 GbE switch.

I'm not sure how you reckon that a Thunderbolt switch with n ports would need any less switching capacity than an n*2 port 10GbE switch. 20Gbps full-duplex is precisely 2*10Gbps full-duplex.

A Thunderbolt switch would ignore DisplayPort signals, because it would never receive any. It would only transact in Thunderbolt packets, and wouldn't give a hoot as to their underlying protocol, be they PCIe or DisplayPort.

And for the reason I just mentioned, it would also not require a PCIe switch, and only a single, albeit unconventional, Thunderbolt controller. All a fully functional n-port TB switch would need is a controller chip with a 2 x (2*n) Thunderbolt cross-bar switch, a TB HCI with 2 Thunderbolt to DisplayPort Sink protocol adapters, a 2-port DisplayPort PHY, and an n-port Thunderbolt PHY. One chip should get the job done, save for the DisplayPort functionality which would most likely require more depending on how fancy you want to get.

AidenShaw said:
And as far as the number of devices, I don't think that we know the actual situation. If the device limit is determined by daisy-chain latencies, then a T-Bolt switch-based topology could have more than 6(7) devices.

Apple is quite clear that even on the 27-inch iMac, regardless of whether you use one or both of the Thunderbolt ports, the device limit is still 6, so I'm not sure changing the topology makes any difference. As I looked at the Intel slide I posted earlier, I noticed that they qualified their previous statements by saying that you could have 6 Thunderbolt + 1 DisplayPort devices in a chain, which falls more in line with Apple's "up to six Thunderbolt peripherals." Then again, I really haven't heard anyone complain about not being able to connect all their TB gear at the same time...

toke lahti · Jan 12, 2012

repoman27 said:
And here's an Intel slide from IDF which clearly reiterates this:
View attachment 319612

"Each direction in each channel can be data and / or display."
Meaning one pipe to one direction can mix data & dp?

repoman27 said:
I calculate the ATD's DP needs (2560x1440, 24bpp, 60Hz, CVT-R) to be 5.8Gbps or 725MB/s. That's all headed outbound on the cable.

2 ATDs chained makes 11.6Gbps, which won't fit in one direction of one channel.
If TB couldn't mix dp & data in one path, they would both be used to dp and then you couldn't send data to DAS.
Since you obviously can, dp & data are mixed together in one path, meaning that TB controller can allocate more than 10 Gbps of dp or data in one cable?
So the hard limit is 20 Gbps if there's enough bandwidth in PCIe?

Btw, the slide says we will get optical cables in present year...

AidenShaw · Jan 12, 2012

repoman27 said:
Quite true, and a Thunderbolt switch should be pretty similar to a PCIe switch, but it would be a pretty beefy one, switching the equivalent of 5 lanes of uplink to 30 of downlink. I think the Thunderbolt PHY is probably considerably more expensive to implement than a PCIe card-edge connection though.

Wouldn't a T-Bolt switch be simply a PCIe switch (4 lanes PCIe in to n*4 lanes PCIe out), but it would have a T-bolt controller on the input (T-Bolt to PCIe) and a T-Bolt controller on each output (PCIe to T-Bolt)?

You wouldn't need a T-Bolt switch, simply use a PCIe switch.

repoman27 said:
I'm not sure how you reckon that a Thunderbolt switch with n ports would need any less switching capacity than an n*2 port 10GbE switch. 20Gbps full-duplex is precisely 2*10Gbps full-duplex.

It's quite simple - the input to the T-Bolt switch is 10 Gbps full duplex. It doesn't need to have more capacity than that.

repoman27 said:
A Thunderbolt switch would ignore DisplayPort signals, because it would never receive any. It would only transact in Thunderbolt packets, and wouldn't give a hoot as to their underlying protocol, be they PCIe or DisplayPort.

I'd prefer the "route DP to one output", so that the T-Bolt switch could be connected directly to the computer. Otherwise, you'd have to put the switch after the monitor.

repoman27 said:
And for the reason I just mentioned, it would also not require a PCIe switch, and only a single, albeit unconventional, Thunderbolt controller. All a fully functional n-port TB switch would need is a controller chip with a 2 x (2*n) Thunderbolt cross-bar switch, a TB HCI with 2 Thunderbolt to DisplayPort Sink protocol adapters, a 2-port DisplayPort PHY, and an n-port Thunderbolt PHY. One chip should get the job done, save for the DisplayPort functionality which would most likely require more depending on how fancy you want to get.

I was trying to build it from off-the-shelf components. And since the input is limited to the 10 Gbps on the input, your fancy special silicon would perform the same as mine.

And why on earth would you think that you'd need a cross-bar switch? Does T-Bolt support peer-to-peer PCIe transfers, or is it all master-slave (CPU-device)? (While peer-to-peer is part of the PCIe standard, it is seldom used.)

If it's master-slave, the 10 Gbps limit is fine.

repoman27 said:
Apple is quite clear that even on the 27-inch iMac, regardless of whether you use one or both of the Thunderbolt ports, the device limit is still 6, so I'm not sure changing the topology makes any difference.

You may be right, or you may be wrong. Neither of us can say which.

Again, as I said, if the 6 device limit comes from the additive latencies of daisy-chaining, then a tree-based switch topology could easily have a different limit. I can't be right or wrong - because I'm presenting a possibility, not arguing pro or con. My position is valid whether a switch-based topology supports 6 devices or 42 devices.

But Apple is probably right, in the sense that it's likely that no current Apple will support T-Bolt switches - and 6 is the hard limit.

----------

toke lahti said:
Btw, the slide says we will get optical cables in present year...

But they'll be exactly equivalent to the copper cables in all characteristics except length and latency.

repoman27 · Jan 12, 2012

toke lahti said:
"Each direction in each channel can be data and / or display."
Meaning one pipe to one direction can mix data & dp?

Exactly. The packets, whether they were originally DisplayPort or PCIe, are all encapsulated in Thunderbolt packets by the protocol adapters before they are transported. The Thunderbolt switch, PHY, and cable are just focused on getting all of the packets to the correct addresses in a timely fashion.

toke lahti said:
2 ATDs chained makes 11.6Gbps, which won't fit in one direction of one channel.
If TB couldn't mix dp & data in one path, they would both be used to dp and then you couldn't send data to DAS.
Since you obviously can, dp & data are mixed together in one path, meaning that TB controller can allocate more than 10 Gbps of dp or data in one cable?
So the hard limit is 20 Gbps if there's enough bandwidth in PCIe?

As far as I can tell (and who really knows who isn't under NDA), the hard limits are a factor of what protocol adapters a Thunderbolt controller contains. It looks like Light Ridge has 2 DP 1.1a to Thunderbolt Source adapters, 1 DP 1.1a to Thunderbolt Sink adapter, and 1 bi-directional PCIe 2.0 to Thunderbolt adapter. This would allow it to move 10Gbps PCIe + 17.282Gbps DP in the outbound direction, and 10Gbps PCIe + 8.641Gbps DP in the inbound direction.

Of course 10 + 17.282 > 20, so unless you have a 27-inch iMac or some other device with 2 Thunderbolt ports, you're limited to the 20Gbps that the cable can carry. The only real world scenario I can come up with that would bump into the 20Gbps ceiling, is is to daisy chain two Apple Thunderbolt displays and a Pegasus R6 full of SF-2281 based SSD's, and then perform a long sequential write using highly compressible data. This would result in 11.6Gbps of DisplayPort packets and 10Gbps of PCIe packets all heading in the same direction simultaneously and overwhelm the cable's total single-direction bandwidth of 20Gbps. This is an absolute corner case, and yet the performance impact would still be fairly minor.

toke lahti said:
Btw, the slide says we will get optical cables in present year...

I said it was an Intel slide, I never said anything about it not being a pack of lies.

AidenShaw said:
Wouldn't a T-Bolt switch be simply a PCIe switch (4 lanes PCIe in to n*4 lanes PCIe out), but it would have a T-bolt controller on the input (T-Bolt to PCIe) and a T-Bolt controller on each output (PCIe to T-Bolt)?

You wouldn't need a T-Bolt switch, simply use a PCIe switch.

Why would you convert back and forth unnecessarily? To go back to my 10GbE example, an Ethernet network just moves frames between addresses, it doesn't care about the structure of the higher layers. An Ethernet switch is equally happy forwarding TCP/IP, UDP, AppleTalk, or any number of other types of packets. Thunderbolt does the same thing, delivers packets to addresses, and doesn't care about whether they are PCIe or DisplayPort until they get to their destination.

AidenShaw said:
It's quite simple - the input to the T-Bolt switch is 10 Gbps full duplex. It doesn't need to have more capacity than that.

Each port is 2x10Gbps, full-duplex. The beauty of protocols like FireWire and Thunderbolt is that each node can communicate directly with another. If you want to clone the data on one Thunderbolt drive to another, you can do so with little to no involvement of the host CPU, PCH, or system memory. After you initiate the copy, PCIe packets just flow from one drive to the other. This is not a host arbitrated bus like USB.

AidenShaw said:
I'd prefer the "route DP to one output", so that the T-Bolt switch could be connected directly to the computer. Otherwise, you'd have to put the switch after the monitor.

Native DisplayPort displays can only be the last device in a chain, because they terminate them by default. With a Thunderbolt switch, it wouldn't really matter where you connected them since they wouldn't be blocking the only way of extending the chain.

AidenShaw said:
I was trying to build it from off-the-shelf components. And since the input is limited to the 10 Gbps on the input, your fancy special silicon would perform the same as mine.

And why on earth would you think that you'd need a cross-bar switch? Does T-Bolt support peer-to-peer PCIe transfers, or is it all master-slave (CPU-device)? (While peer-to-peer is part of the PCIe standard, it is seldom used.)

If it's master-slave, the 10 Gbps limit is fine.

I see where you're coming from, with the off the shelf thing. I think we'd end up with a very expensive and not hugely functional switch using what's available (or not-so-available) right now for controllers.

A single Thunderbolt channel is 10Gbps, but every port is dual-channel, and hence 20Gbps. Just because current controllers are limited to 10Gbps of PCIe I/O, doesn't mean that the architecture can be reduced to 10Gbps total bandwidth. The cross-bar switch in a Light Ridge chip looks to be capable of switching no less than 8 10Gbps, full-duplex channels.

As I mentioned above, Thunderbolt is in theory peer-to-peer, both for PCIe and DisplayPort packets.

AidenShaw said:
You may be right, or you may be wrong. Neither of us can say which.

It often turns out that I'm wrong about things I've said on this forum, but I still enjoy the mental exercise that comes along with rampant speculation.

MattInOz · Jan 12, 2012

SockRolid said:
Thunderbolt could be the reason why Apple hasn't updated the Mac Pro in a year and a half. Soon there won't be any need for a big all-in-one tower. For ultra-high-performance, you'll be able to build a cluster of Mac minis, with an external GPU and storage. All connected with Thunderbolt.

Just a crazy theory, but it could happen. Especially after optical Thunderbolt is available.

There will always be need for what seems like a crazy amount of Processing power. Although the hope is that will evolve into a prosummer/soho cluster with the help of Thunderbolt to attach in our personal and portable machines.

CES 2012: MSI Shows Off Thunderbolt-Connected External GPU

macrumors 6502

macrumors 65816

macrumors 6502a

macrumors 68040

macrumors G5

Moderator emeritus

macrumors 68040

macrumors 68000

macrumors P6

macrumors 6502

Moderator emeritus

macrumors 6502a

macrumors 6502

macrumors 601

macrumors 6502

macrumors 6502a

Moderator emeritus

macrumors 68000

macrumors 6502

macrumors P6

macrumors 6502

macrumors 68040

macrumors P6

macrumors 6502

macrumors 68030

Our Staff