Yeah, that, and the “we’re done except for Mac Pro,” were two very uncharacteristic revelations today. Until now, nobody knew for sure that a Mac Pro with M1 was actually coming, and nobody new for sure that there wasn’t going to be an Ultra Duo.
Given these two data points, seems to me we’re most likely not going to see an M1 Mac Pro, and instead it will be an M2 variation (so they can offer a 40+ core variant). It’s not possible to do an M1 Ultra x2 unless you (1) build some sort of smart interposer, which is a bunch of work that has little payoff for apple or (2) use dual sockets. They made a big deal today about the programming model, and why two sockets is bad, so I don’t think they’ll do that.
I suspect that M2 Max will have a fancier fusion bus that allows each die to talk to 2 neighbors instead of 1, and may support up to 8 M2 Max’s tiled.
Yeah, I didn't miss the Mac Pro bit. Very interesting.
The other thing that grabbed my attention was the 2.5 TB/s (UltraFusion!) interconnect bandwidth (at 27:55). Johny then said that it's "more than 4 times the bandwidth of the leading multi-chip interconnect," which I'd think refers to AMD's Infinity Fabric 3.0 at 400 GB/s bidirectional. Oddly, though, he could have also said "more than 6 times the bandwidth," so I'm not sure what to make of that.
Anyway, the M1 Ultra has 800 GB/s memory bandwidth, 400 GB/s per die. So, 2.5 TB/s die-to-die seems... excessive. I guess there's some kind of cache unification that requires enormous throughput. Or perhaps the interconnect is designed to handle more dies in a different configuration.
Edit: After ruminating a bit, I think that Johny’s “more than 4 times the bandwidth” statement may be a hint at the next-gen interconnect/interposer.
Grabbing a napkin… Since each die can do 2.5 TB/s, and four dies would require six direct interconnects to be fully connected (3 on each die), each interconnect would handle 2.5/3 TB/s = 833 GB/s. Infinity Fabric can do 400 GB/s
switched, so 800 GB/s total in a quad setup, and thus an average of 800/4 GB/s = 200 GB/s overall. 4 * 200 < 833 < 5 * 200, QED… Yeah, that’s the best I could massage the numbers.
Edit 2: I looked into Infinity Fabric some more and while it’s not switched, it’s unclear to me what the actual bandwidth would be in different configurations and my numbers above might be correct… ish. It seems like there are a lot of rules for different configurations.
Anyway, I saw a twitter post on March 8 by a Japanese reverse engineering firm TechanaLye indicating that they’d analyzed the UltraFusion region on the M1 Max. Nice die shots, but, sadly, the details would be in one of their paid reports.