With the additional 8x 2.0 lanes from the chipset mentioned in the post Nugget points to, they might not be bothering to use a PCIe switch to get more bandwidth by converting 3.0 lanes into a larger number of 2.0 lanes. Even without this they'd be able to route 4x 2.0 lanes to each Thunderbolt controller.
If any such switch exists.
The "additional" 8 isn't pragmatically there.
WifiBluetooth x1 (discrete controller; not in chipset)
USB 3.0 x1 ( discrete controller; not in chipset )
Ethernet x1 ( from chipset )
Ethernet x1 ( with discrete )
[NOTE: even if collapse both Ethernet ports to 1 possibly an audio chip cosuming another if not using chipset audio. ]
Probably only 4 left from the chipset. So
x4 TB controller 0
x4 TB controller 1
x4 TB controller 2
x16 GPU
x16 GPU
Is 44 and still haven't done the PCI-e SSD (another x4 )
The new Mac Pro is oversubscribed on PCI-e lanes. ( So is the 2009-2012 one. the two x4 slots share bandwidth.) With the GPUs clocked down so low, they may be borrowing x4 from one of those 16's. This is just a 1/4 drop in sharing 4 ( and having 12 not ) versus 100% share if overlap if making a TB and SSD share.
Longer term Apple needs a variant chipset from Intel that can trade-in SATA lanes (that Apple is using zero of ) for PCI-e lanes (and blow past the upper bound of x8 ).
Either that or specialized TB controllers that can 'step down' PCIe v3 traffic into v2. The SSD should eventually go x2 v3 with less drama.
This would mean you'd want to put any devices that plausibly needed more than 1000 MB/s on separate buses from each other,
Concurrently need more than.... Different storage groupings used at different times will get along OK.
as you'd have a limit of 2000 MB/s per bus rather than per port... but in practice that wouldn't be much of a real-world bottleneck.
There always was a difference between the Thunderbolt data bandwidth (from port to port) and the Thunderbolt controller to host bandwidth.