Trying to build a cMP (5,1) to reach 14,000 MB/s - need info on architecture constraints?

zedex

macrumors regular
Original poster
Oct 21, 2018
123
47
Perth, WA
Can someone (please) direct me to a cMP (5,1) i/o architecture schematic I've seen somewhere on this site which will help me calculate maximum IO throughput ceilings as defined by the architecture and not the visible IO interface technologies (FW ports, SATA interfaces and PCIE slots) which I falsely assumed (in the early days of my cMP ownership) were a 'true' reflection of the headroom available for concurrent data transfers.

To better explain - until I first saw this diagram/schematic - I counted PCIe x16 (x2) = 32 lanes and PCIe x4 (x2) = 8 lanes and thus believed my PCIe lane count total was 40 lanes. However, the architecture diagram is how I discovered that slot 3 and 4 were shared and therefore the ceiling for PCIe saturation is in effect 36 lanes.

(Testing Theoretical Max 1) 36 lanes should max out at 13500 - 14,000 MB/s

I've also read a thread a while ago that said if I plug 6x Samsung EVO SATA SSDs into the 6 direct connect SATA II slots that I may not be able to achieve 1500MB/s transfer speeds I'm expecting* because of architecture contraints.

*(Testing Theoretical Max 2) based on a basic RAID 0 setup across 6 SSDs all comfortably reading or writing in parallel at 250MB/s each for a total of 1500MB/s.

Lastly - through some of my own testing - I discovered that 4 external OWC Firewire HDDs connected to each of the 4 FW ports on my cMP failed to reach anywhere near the 400MB/s throughput I expected from a RAID 0 setup. I think the reason is that the 4 discrete FW IO ports and sharing channels at the architecture level which is why the average transfer speed is only 180MB/s .
 

handheldgames

macrumors 68000
Apr 4, 2009
1,713
930
Pacific NW, USA
The PCIE 2.0 Bus in the cMP is your limiting factor.

Using a Highpoint SSD7101-A equipped with 2 970 Pro 1TB SSD's
one pcie 2.0 x16 slot can deliver ~6000 MB/sec in raid 0

Using two Highpoint SSD7101-A, each equipped with 2 970 Pro 1TB SSD's
two pcie 2.0 x16 slots can deliver ~12000 MB/sec in raid 0.
In this configuration, you'll need to use an x4 slot for video.

In a PCIe 3.0 connection, each SSD7101a can deliver 12000 MB/sec with 4 1TB 910 Pros. Two fully populated SSD7101a would deliver 24,000 MB/sec.
 

Slash-2CPU

macrumors 6502
Dec 14, 2016
306
160
You’ll hit the limit of the CPU QPI.

6.4GT/s QPI is theoretically capable of 25.6GB/s. That is combined send and receive at the same time. So theoretical limit is 12.8GB/s one-way. Realistically, it’ll max out around 9.8-11.6GB/s.

I have not seen anyone hit this limit.

Also haven’t seen anyone actually test performance of two SSD7101a’s.

Dual SSD7101a’s are the only setup I can imagine that could hit QPI limit.

Other potential but less likely limit is CPU’s ability to keep up with software RAID at that rate.
 

zedex

macrumors regular
Original poster
Oct 21, 2018
123
47
Perth, WA
Thanks to both of the above experienced contributors for helping me get to the bottom of this.

6.4GT/s QPI is theoretically capable of 25.6GB/s. That is combined send and receive at the same time. So theoretical limit is 12.8GB/s one-way. Realistically, it’ll max out around 9.8-11.6GB/s.
That excellent explanation by Slash-2CPU of the QPI threshold essentially makes the goal redundant because I'm mainly concerned about maximising the concurrent data transfer capability.

I'm still interested to know if 6x SSDs (RAID 0) in the SATA II slots would allow typical transfer speeds of 1500 MB/s? Someone else on these forums has indicated an internal transfer limit of 600-700 MB/s for the cMP SATA slots but apart from the system architecture - I don't know why this would be the case.
 

Slash-2CPU

macrumors 6502
Dec 14, 2016
306
160
The DMI link between the X58(or 5520) PCH and the ICH(sata controller) is 1GB/s theoretical. 680MB/s is typical. 750MB/s is the extreme upper limit of what you could maybe achieve.

You will see performance gains in RAID 0 SSD arrays with 3-4 SSD’s. Sequential transfers will hit DMI limit quickly. More typical random transfers will still see a gain in IOPS. Any SATA SSD will max out under 100MB/s on 4KB random read/write. For any task except sequential transfers, SATA II and a controller limit of 600-700MB/s won’t matter.
 

joevt

macrumors 6502a
Jun 21, 2012
526
259
6.4GT/s QPI is theoretically capable of 25.6GB/s. That is combined send and receive at the same time. So theoretical limit is 12.8GB/s one-way. Realistically, it’ll max out around 9.8-11.6GB/s.
The diagram shows two of those QPI links, one per processor to the northbridge. I wonder if some fancy coding could allow getting more than 11 GB/s? Compare the results of two PCIe 3.0 x16 M.2 x4 cards using SoftRAID and Disk Utility raid. Are there other software raid solutions? Maybe the single QPI link between the CPU's will be the limit.

The theoretical limit (without overhead consideration) of a PCIe 2.0 x16 slot is 8 GB/s. People have achieved 6 GB/s from a single slot. With two x16 slots you might reach the single QPI limit. You would need to get 12.8 GB/s to prove that you were surpassing the single QPI limit. You can add up to 2 GB/s (≈1500 MB/s typical) with one of the x4 slots. The south bridge DMI can only add up to 1 GB/s (≈700 MB/s typical).

My pcitree.sh script at #344 shows the PCIe link speed and width of each device which should match the diagram. It is missing PCI, DMI, and QPI speeds. Each device is limited by it's parent. Follow the path from the device to the root.
 
  • Like
Reactions: zedex and startergo

Slash-2CPU

macrumors 6502
Dec 14, 2016
306
160
The diagram shows two of those QPI links, one per processor to the northbridge. I wonder if some fancy coding could allow getting more than 11 GB/s? Compare the results of two PCIe 3.0 x16 M.2 x4 cards using SoftRAID and Disk Utility raid. Are there other software raid solutions? Maybe the single QPI link between the CPU's will be the limit.

The theoretical limit (without overhead consideration) of a PCIe 2.0 x16 slot is 8 GB/s. People have achieved 6 GB/s from a single slot. With two x16 slots you might reach the single QPI limit. You would need to get 12.8 GB/s to prove that you were surpassing the single QPI limit. You can add up to 2 GB/s (≈1500 MB/s typical) with one of the x4 slots. The south bridge DMI can only add up to 1 GB/s (≈700 MB/s typical).

My pcitree.sh script at #344 shows the PCIe link speed and width of each device which should match the diagram. It is missing PCI, DMI, and QPI speeds. Each device is limited by it's parent. Follow the path from the device to the root.

Even if you could write code or force affinity between two benchmarks to use both QPI links in a mostly-balanced scenario, I’d be very surprised if the 5520 northbridge is capable of it in practice. My bet is that it’s not two real 12.8GB/s links and you’d probably see a decrease in throughput due to contention or buffer shortages. It’s going to be more like 12.8GB/s bursts to either CPU socket, with 10-13GB/s sustained total.

I doubt Intel even had hardware capable of throughput like we have with Amftec or 7101a1 to test this during development.

Still would like to see someone try.
 

joevt

macrumors 6502a
Jun 21, 2012
526
259
Even if you could write code or force affinity between two benchmarks to use both QPI links in a mostly-balanced scenario, I’d be very surprised if the 5520 northbridge is capable of it in practice. My bet is that it’s not two real 12.8GB/s links and you’d probably see a decrease in throughput due to contention or buffer shortages. It’s going to be more like 12.8GB/s bursts to either CPU socket, with 10-13GB/s sustained total.

I doubt Intel even had hardware capable of throughput like we have with Amftec or 7101a1 to test this during development.

Still would like to see someone try.
Right, I would also like to see someone try. Yes, there could be internal limitations. For example, a Thunderbolt 3 controller should be capable of at least 3000 MB/s (up to around 3500 MB/s) having two Thunderbolt 3 ports that each can do at least 2500 MB/s (up to 2750 MB/s) but something limits the total to around 2800 MB/s.
 

handheldgames

macrumors 68000
Apr 4, 2009
1,713
930
Pacific NW, USA
Right, I would also like to see someone try. Yes, there could be internal limitations. For example, a Thunderbolt 3 controller should be capable of at least 3000 MB/s (up to around 3500 MB/s) having two Thunderbolt 3 ports that each can do at least 2500 MB/s (up to 2750 MB/s) but something limits the total to around 2800 MB/s.
Even if you could write code or force affinity between two benchmarks to use both QPI links in a mostly-balanced scenario, I’d be very surprised if the 5520 northbridge is capable of it in practice. My bet is that it’s not two real 12.8GB/s links and you’d probably see a decrease in throughput due to contention or buffer shortages. It’s going to be more like 12.8GB/s bursts to either CPU socket, with 10-13GB/s sustained total.

I doubt Intel even had hardware capable of throughput like we have with Amftec or 7101a1 to test this during development.

Still would like to see someone try.
I have 2 Highpoint 7101-a controllers, I'm just short of the 2-3 SSD's per card required for the test. If a mfr./vendor/review site out there would like to lend some out against a credit card, I'd be happy to conduct the tests and return the SSD's.
 
  • Like
Reactions: zedex

Slash-2CPU

macrumors 6502
Dec 14, 2016
306
160
I have 2 Highpoint 7101-a controllers, I'm just short of the 2-3 SSD's per card required for the test. If a mfr./vendor/review site out there would like to lend some out against a credit card, I'd be happy to conduct the tests and return the SSD's.
Got Amazon Prime? Ha. ;)