MP 7,1 PCIe Bifurcation Support

Slash-2CPU

macrumors 6502
Original poster
Dec 14, 2016
300
156
New Orleans, USA
I wonder if the MP 7,1 will support bifurcation on the x16 slots? That would allows using the cheaper m.2 adapters that don't have those hot PCIe switches. Anyone have insight?
 
  • Like
Reactions: LightBulbFun

bsbeamer

macrumors 68020
Sep 19, 2012
2,402
1,116
https://www.tomshardware.com/news/intel-xeon-cascade-lake-w-3000-series-specs,39278.html

"The Xeon W 3000-series, also known as Cascade Lake W (CSL-W), continues to cater to the enterprise and workstation markets. In comparison to Skylake W, the new Cascade Lake W chips bring very significant core upgrades. The processors will find their home inside Intel LGA 3647 motherboards with the corresponding C621 chipset."

Believe Skylake W fully supports bifurcation through all corresponding boards/chipsets, so would assume Cascale Lake W will as well.

Only official Intel reference I can find right now:
https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/xeon-scalable-platform-brief.pdf

15 Up to 2.4x claim based on TLS Web Proxy using NGINX®: Intel® Xeon® E5-2658 v4, DDR4-2133, Intel® PCH C612, Intel® 895XCC based QuickAssist Accelerator Adapter PCIe Gen3 x8 links, OpenSSL-Async (0.4.9-009) + NGINX-1.6.2 (0.1.0-008), QAT1.6.L.2.6.0-60. Cores, IO, packet buffer memory, and processing cores are on a single socket. 6 cores used on one Socket 12Cores are used, Crypto algorithm: AES-128-CBC-HMAC-SHA1 vs. Intel Xeon 6152 2.10 GHz, DDR4-2400 3x Intel® Corporation Ethernet Controller X710 (4 x10 Gbe ports per card), 1x Intel® Corpora-tion Ethernet Controller X710 (2 x10 Gbe ports per card), PCIe x16 to 2 x8 PCIe bifurcation plugin card, Lewisburg-L B1 QuickAssist Accelerator with PCIe Gen3 x24 links, Intel®OpenSSL-1.0.1u + NGINX-1.9.6, Intel® QAT1.7.Upstream.L.1.0.0-15. Cores, IO, packet buffer memory, and processing cores are on a single socket. 6 cores used on one Socket, 20Core are used. Crypto algorithm:AES-128-CBC-HMAC-SHA1.

I'm more curious about Apple's logic board in MP7,1 at the moment.
 

Woof Woof

macrumors member
Sep 15, 2004
80
12
There was mention elsewhere that the old slot utility from the 2008 days is present in Catalina as version 2.0.

Maybe someone can rip it apart and see if there is artwork or text that suggests bifurcation is a feature.
 

deconstruct60

macrumors 604
Mar 10, 2009
7,979
1,172
I wonder if the MP 7,1 will support bifurcation on the x16 slots? That would allows using the cheaper m.2 adapters that don't have those hot PCIe switches. Anyone have insight?
I would expect not. First because there already are x8 slots there already. The upcoming Mac Pro basically overprovisions the CPU PCI-e lanes out of the box.

The CPU has four x16 headers. The next Mac Pro has ( from the Overiew page for Mac Pro image where labels slots. the Physical widths are written next to the slot number. I've put my guess at the electrical width in parens )


slot 1 : x16 (x16) [ slot '1a' MPX connector x8 + other stuff ]
slot 2 : x8 (x8 )
slot 3 : x16 (x16) [ slot '3a' MPX connector x8 + other stuff ]
slot 4 : x16 (x8 )
slot 5 : x16 (x16 )
slot 6 : x8 (x8 )
slot 7 : x8 (x8 )
slot 8 : x4 (x4 )

[ not marked in that picture but Slot 1 & 2 in MPX Bay '1' and Slot 3 and 4 in MPX Bay '2' ]


If just add up all the Phys there (and ignore the MPX connectors slots ) that is 92 lanes. There are only 64 lanes off the CPU total. So something has to be something related to 'bifricated' has to be present already. At least one of those x16 has been "chopped up" just to get to that many slots.

I suspect it has been provisioned something like the following

x16 --> slot 1
x16 --> slot 3
x16 --- PCI-e switch (***) ---|

|-- x8 slot 1a (**)
|-- x8 slot 2
|-- x8 slot 3a (**)
|-- x8 slot 4 ( maybe a x16 electrical )
x16 --- PCI-e switch------|
|-- x16 slot 5
|-- x8 slot 6
|-- x8 slot 7
|-- x4 slot 8


(**) these also may be chopped up into two x4's and delivered to the MPX connector in a 'bifurcated' set up so don't need a switch on the MPX module.

(***) This can be a fixed bifurcation without a 3rd party switch if an even x8 for just 1a and 3a. If want to link x16 electrical even though don't have x16 of bandwith then would need a switch and the the MPX connector as alternative to PCI-e slot .


So at least a couple of the slots marked 'x16' are already bandwidth diluted already out of the box. Something chopped up into "neat" two x4's probably isn't there because already behind a switch anyway. [ I'd be surprised if Apple wanted to get into the boot configuring of switches that attached to the motherboard and certifying specialized bifurcation cards. ]

For this market ( >$6K) there are probably a very high number of folks that will be running 2 GPUs. (Maybe not both MPX modules but many running at least one.). So the two switched above are likely provisioned to x16 cards present in the system. Or One CPU and one Afterburner in Slot 3. It would just be the corner case of one GPU where were not dealing with direct connection to he CPU.

Afterburner in slot 5 is actually a compromise if actually substantively use the Thunderbolt connectors in slot 8 to move data.

the PCH is even more clogged up. x4 T2 , x2-4 10GbE , x2-4 !0GbE , x4 Thunderbolt top , x1 Wi-fi , x1 Bluetooth. [ So that's probably why slot 8 or the 1a/3a/ or 6-7 not allocated off the PCH. ]



So it seems more likely that Apple is just going to depend upon the M.2 add-in-card to just get bigger simply just use more deadicated/custom switches for SSD to get to a lower price point, rather than adding complexity to Mac Pro and certification stack just to chase a niche of "cheaper cards'. If "cheaper" was a main driving point for the Mac Pro the entry price wouldn't be $6K in the first place.
 
Last edited:

shokunin

macrumors regular
Jun 7, 2005
204
27
the PCH is even more clogged up. x4 T2 , x2-4 10GbE , x2-4 !0GbE , x4 Thunderbolt top , x1 Wi-fi , x1 Bluetooth. [ So that's probably why slot 8 or the 1a/3a/ or 6-7 not allocated off the PCH. ]
+1 on this. A quick search on intel's ark didn't bring up the chipset DMI lane support on 2nd Gen Xeon-W, but hopefully it's x16 lanes rather than the usual x4 PCI 3.0 lanes to the chipset. If it's only x4 from CPU to PCH then that may explain the slower T2 SSD speeds from sharing (switching) lanes between 10Gbe, T2, TB3, internal SATA ports, and all the other items you listed.

For maximum bandwidth it'd be far cheaper to load m.2 NVMe's on a x16 riser card (Highpoint 7101/7102, Sonnet 4x4, or the like) than opt for larger CTO flash storage. If bifurcation is supported then hopefully cheaper riser cards without the PLX will work.
 

deconstruct60

macrumors 604
Mar 10, 2009
7,979
1,172
So you don’t think any slots are using lanes from the chipset/PCH?
No previous Mac Pro has ( including the Mac Pro 2013). Why would Apple start now with even more slots to timeshare bandwidth for and even faster default boot SSD coupled to the PCH? The 2009-2012 model did use a PCI-e switch to share x4 of bandwidth between the two x4 slots. It is far more likely that Apple will do the same thing here. At least for the standard PCI-e slots.

Furthermore, I can count to eight which apparently some other folks are willfully ignoring. That isn't "speculation" it is simply just arithmetic. Go look at other boards implemented by folks and how many of them have a x8 electric ( not x8 physical ) slot hooked up to the PCH. You'll find it is a dismally small number. The PCH itself it only provisioned with pragmatically x4 PCI-e v3 lanes. So just how productive would it be to hook a > x4 slot to a source that wasn't > x4? At the same PCi-e version number, not very productive at all. You can dilute bandwidth were ( M >= N ) and xM is switch out so some bundles that are at most M = N but once N > M where are all those N bits going? Effectively it is akin to conservation of mass. You can't push 10 gallons of water per minute through a 5 gallons of water per minute pipe.

You may see folks hooking up 2-5 x4 connections to the PCH, but likely not going to find any example of an x8. (also may find someone who has stepped down x4 PCI-e v3 into a x8 PCI-e v2 to fit some corner case cards. but that is likely not hanging of the PCH. )

So how many slots does the new Mac Pro that are x4 chunk size that the PCH can max out on .... 1. Is that one a good candidate to toss onto the PCH? Not really....

As I also pointed out the PCH is also a bit maxed out itself. The SSD portion of the T2 can just about saturate the DMI link. The two 10GbE steams can ramp all by themselves ramp to approximately 63% of the DMI link. Possible chance that also the "top" TBv3 controller again over 75+% of the DMI link. There isn't copious spare bandwidth on the PCH ( unless not using substantive stuff that have paid for. )

Some screen shots of the PCI config kit seem to back that up. The fact that the x4 slot can be allocated to Pool A or Pool B highly suggests that all the 'A or B' slots are hanging off of a switch.





https://9to5mac.com/2019/07/01/expansion-slot-utility-mac-pro/

Otherwise how do you push a radio button and re-allocate them to different 'pools'.

The high slot count on this Mac Pro is fully aimed at is probably exactly what Apple already demonstrated. 5-6 DAW cards that are basically a bandwidth load less than x4 PCI-e v3 each. Those will work even if chopping a x16 up into four x4 chunks as outlined above. Slots 6-7 probably are also somewhat targeted at high sunk cost x8 PCI-e v2 cards that can but upshifted into just x4 of v3 bandwidth ( somewhat like the TBv2 controllers on the MP 2013). The older legacy SAW are even easier as only effectively have x1 worth of bandwidth coming out of them. 5-6 of those won't even take up a single x8 bundle of bandwidth. Stuff a Mac Pro full of those looks like beating the drum slot count wise but bandwidth it isn't hard.

I don't think it is shooting at the video cases of "all slots" fully maxed out electrically is what they are aiming at. ( and hooking to the PCH isn't going to "buy" much as it is also likely loaded down. ).

The two "slots" which may match up well with being tossed onto the PCH are the MPX connector slots if Apple is trying to save some bucks and willing to dilute the bandwidth even more. the two Thunderbolt controllers on the full MPX modules have a 'natural' slot into two x4. Those provisioned off the PCH would choke the DMI line even more but if folks are only using them mainly as display links it would pragmatically work out 'OK' for a substantive number of folks (e.g. the HDX monitors are basically zero on PCI-e bandwidth. Likewise if the video out is in DP legacy pass-through mode. ). No x8 -> two x4 switch on MPX module would make them cheaper to make. Using the PCH allow them to leverage the 'free' (aleady paid for) switch there. Is that a better balanced system? No. Is the Scrooge McDuck money pit deeper? Probably yes.

If Apple has a huge focus on these Afterburner cards in slot 4 or 5 they might through the TB bandwidth under the bus on the MPX cards ( and top TB slots ).

In the configuration screen 7 wasn't in either A or B. if toss slot 7 onto the first switch with 4 and 2. Also then toss the 1a and 3a off the PCH .
 
  • Like
Reactions: JedNZ

deconstruct60

macrumors 604
Mar 10, 2009
7,979
1,172
+1 on this. A quick search on intel's ark didn't bring up the chipset DMI lane support on 2nd Gen Xeon-W, but hopefully it's x16 lanes rather than the usual x4 PCI 3.0 lanes to the chipset.
Highly likely not. Lanes reserved for hooking up the C600 series PCH were already in the x48 the first generation had. They got the "extra" x16 to expose as general purpose lanes for the new W 3000 series, 'bigger' socket products

"... Both the Skylake and Cascade Lake dies had the x16 lane root complex available on-die. Until now Intel reserved those lanes for the on-package Omni-Path Host Fabric Interface (HFI) integration. ..."
https://fuse.wikichip.org/news/2400/intel-rolls-out-cascade-lake-xeon-w-processors/

So if push the x16 back into the DMI workload the next available would go back to what it was and really only would "buy" something on using Intel 10GbE solutions. ( the "extra' PCI-e lanes that the PCH took feed into enable Intel stuff, not general purpose lanes. ) So Apple is probably not using the 10GbE that Intel had built into the C600 chipset and using the same chips already have drivers for from the iMac Pro and Mac Mini. ( and I suspect using the Mini like config of hanging them of the PCH. )


If it's only x4 from CPU to PCH then that may explain the slower T2 SSD speeds from sharing (switching) lanes between 10Gbe, T2, TB3, internal SATA ports, and all the other items you listed.
Slower SSD speed could also be better wear managements, but yes also in the context of being a being better sharing playmate with others hanging off the fixed PCH DMI bandwidth.


For maximum bandwidth it'd be far cheaper to load m.2 NVMe's on a x16 riser card (Highpoint 7101/7102, Sonnet 4x4, or the like) than opt for larger CTO flash storage. If bifurcation is supported then hopefully cheaper riser cards without the PLX will work.
Apple is more likely to shoot for cheaper costs for Apple than cheaper costs for 3rd parties. Even with m.2 x8 cards with a switch Apple SSD prices are still even higher once get to upper half range of capacities. It isn't like they would be trying to keep the cheap m.2 cards out so they could under price the PCI-e switch, M.2 cards. Apple's SSDs are priced over both anyway. The notion of "saving max money" with the bifurcation card just flies the base context of "minimum" price of the Mac Pro being $6K. If saving money was a top 3 objective that wouldn't be the entry price.