That interposer isn't magic. On a die-level AMD has been doing this since Zen. The really amazing part is just that you can now do it with two seperately manufactured dies at apparently very low latency. But it's no coincidence both are manufacturing at TSMC. Also: the entire NUMA problem isn't as pronounced with GPUs since GPUs hardly ever create thread crosstalk, nor are their threads living long enough to be physically reassigned to another computing unit.They're doing some weird interposer magic with the M1 Ultra at least for the GPU to make it addressable as a whole, but it would be interesting to see some specific 3d benchmarks that put it over the threshold of 32 or 64gb (per Max die) slightly to see the performance hit on going across the 'UltraFusion' interconnect, which I'm sure is small but probably is measurable.
I haven't seen this yet but hopefully anandtech will do a test like this because it will give us a good idea for what to expect with the M1 Mac Pro (if it exists). If there's a decent amount of latency introduced it might be worth waiting for the M2 version when they refine things a bit. This interconnect technology is from 2017 or so and while it's cutting edge in that they're the first to use it, it isn't cutting edge as far as what TSMC is capable of for future products.
The die stacking and vertical/'3D' interconnects will be very interesting and could truly reduce latency tremendously just due to the very short trace length - see HBM2/3.
Also needing to address the neural engines discretely in the Ultra for ML makes it seem a little bit unfinished/like a compromise product. But the achievement with the GPU really should not be understated, it is huge, especially for a first-gen architecture.
I really want an Icelake (or better) Mac Pro to hold me over for 3-5 years while they sort all this out and get things 100% compatible, but it seems unlikely at this point since the last intel Mac came out in August 2020.
"Within" the M1 Ultra I am confident no meaningful latency exists between the two chips. But that would radically change if another package on another "socket" would be introduced, wired over PCB, probably going through some SerDes implemetation - assuming M1 Ultra even HAS such an interface (of which we have no evidence as of yet).
I really cannot imagine them putting M1 Ultra into the Mac Pro and calling it a day, claiming it's better on the CPU side than the 28 core Xeon. Which it .... technically.... is (but also isn't). They - technically - could, and - technically - wire up PCIe-Cards via internal Thunderbolt, so you would - technically - have expandability, and - technically - there is no hard barrier to running the AMD GPUs with M1 .... but those devices would be severely bandwith starved since one Thunderbolt port delivers basically what's PCIe 3 x4. That's the actual problem: unless M1 Ultra has some interconnect we do not know about M1 Ultra does not have sufficient IO. Not to wire up external PCIe device, not to wire up multiple packages.
Maybe they can actually repurpose the SerDes hardware of the thunderbolt controllers to work in conjunction somehow. And maybe they can be driven much higher than "just" 20 GT/s per lane. Let's just assume they are actually PCIe 5 equivalent, and can drive 32 GT/s. That would mean that an M1 Ultra could - technically - have 32 times 2 times 6 equals 384 GT/s of bandwidth. For comparison: a XEON W has
But even IF that was actually an accurate depiction of how Apple would integrate the M1 Ultra into the Mac Pro, it would still be quite underwhelming to offer a successor 3 years later that has
All in all: I just don't see it. I just don't. I don't see Johnny Srouji's pulling an A16 based, N4 fabbed, MCM-style M2 Ultra Extreme Gigazord rabbit out of his pants at WWDC. And that's kinda what they would need to actually make people upgrade to an Apple Silicon Mac Pro.
Note: Edited since the Xeons in the Mac Pro actually have 48 useable lanes, not 64.
Last edited: