Certainly sounds like an interesting idea to fabricate two M2 Max dies as a monolithic M2 Ultra die. The UltraFusion can be embedded inside this monolithic die and perhaps further optimised to take advantage of a single die.
You don't really need the UltraFusion at all. The Max expands the GPU complexes from the Pro design without "UltraFusion". Largely same issue. It may have grown bulky enough that that intercomponet internal bus needs to be segmented.
either a ring bus bridges
Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
www.anandtech.com
or some mesh stitching .
Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
www.anandtech.com
However, the overall size vs yield and reusability of the M2 Max die (think of it as basis building block like AMD's CCD die approach) may still favour gluing two M2 Max dies through external UltraFusion to form a M2 Ultra die. IDK. I guess economics will decide. Less so about performance per watt as I would think both approaches will be similar in this aspect.
AMD's CCD+I/O uses more power than Apple's single/multi die approach. So the Perf/Watt isn't similar.
The single die Ultra would be around the size of what a 3090 die is now. 600's mm^2 . It past "Medium" size , but also isn't reticle busting. It is in the middle of the 'large' range. Nvidia manages to get those packaged up without the sky falling. And Apple is charging more.
So I would go with M2 Max die as the basis building block for Mac Pro SoCs. Remember this die can be re-used to glue together to form a M2 Ultra die just like in M1 Ultra. You sounded a reasonable guess the base model Mac Pro to start with at least M2 Ultra (if not M2 Max in my opinion). So how to solve additional I/O requirements for Mac Pro?
i think you run into problems because the layout patterns of the Max die get folded into the Ulra configration. That works "OK" for two dies. However, for four there are memory bus layout problems. The Max is quite skewed to getting the memory controllers to feed the GPU cores. So RAM packages go dense packed on either "long" side. That is fine because that leave the remaining two "short" sides for external I/O ( Thunderbolt , x4 PCI-e v4 , internal DisplayPort for laptop) and UltraFusion.
For a four tiles in dense packed configuration you will need three sides just (or primarily ) for UltraFusion. So the layout is different. So the laptop baseline design isn't best starting point.
The additional I/O requirements in my imagination are defined as DDR5 memory controllers, Apple dGPU connectivity, PCIe 5.0 root complex for PCIe slots. So in my speculation, these will be housed in a "powerful I/O die".
That you hooked up to which side of the die? The Max die doesn't need DDR5 , dGPU , or some large PCI-e v5 root complex. So if trying to maximally reused the die as a building block component there is no good reason to put those on for a laptop that isn't going to use them.
Can wave hands and say Apple disconnected the memory/IO controllers for the laptop Max too. But why? Going to pay a Perf/Watt cost to do that. On a laptop why is Apple taking that penalty? Because the Mac Pro is 'special'? Probably not.
AMD hasn't take that penalty for their mobile targeted APUs. There are good reasons not to do that.
This I/O die can be glued to a M2 Max die to form the entry level Mac Pro SoC.
Except that really isn't what TSMC packaging does that Apple is using. The LSI die being glued on is largely just provisions low power overhead paths from one die to another. ( smaller pads to LSI than "outside world" and path (with perhaps a re-driver) to the "other" side. ). Really not trying to put high external output power into the LSI to be a "buried heat source" problem. Largely a passive interconnect.
Can put non buried I/O die and connect them all with something like UltraFusion LSI interposers. But again if you need multiple UltraFusion connectors which side are they hooked to if already have significant edge space provisioned for the LPDDR5 memory controller fan out?
Two M2 Max dies plus this I/O will form M2 Ultra Mac Pro SoC. Four M2 Max dies plus this I/O die will form M2 Extreme Mac Pro SoC.
If Apple build something that scales to four dense packed dies then it would be start forward to dial that back to just two dies. ( could leave some interconnect connections empty.). But starting with the baseline design for just one and then going to four likely either leads to bandwidth choke points or costs ballooned into the single die (or on the four configuration).
Apple should have a desktop only die. It could borrow heavily from the "Max class" laptop die but the layout should be different. And some functionality split up. It doesn't make sense to have 4 secure enclaves when only going to use one. 16 Thunderbolt ports is ridiculous bloat. Six is dubious. More than six is looney tunes.
Long ago there was a rumor of Jade , JadeCHop , Jade2 , Jade4 . What we got was a "Max class" die that Apple photoshopped into a "Jade". The Max was later revealed to be the component for the "Jade2" Soc (Ultra). And got no Jade4. (retrospect not particularly suprising after merging Jade2/Jade designs objectives. )
It would make more sense if had a grouping more like
[ Mx , Mx Max (no dual) , Mx Pro (chopped Max) ] -- mix of laptops and lower half of desktop line up.
[ Mx Ultra , Mx Extreme ] -- just above midrange desktops .
The first group is all monolithic. Maximizes Perf/Watt. Limited I/O.
The second group always uses die packaging ( all have at least some UltraFusion like connector(s). At least medium sized dies so shift the connector/memory controller layout so it scales better as opposed to minimizing MBP 14" logicboard footprint. ) .
I'm very excited if this remotely turns to be the case. Nevertheless, I'll most likely spend money on a M2 Pro Mac Mini. 🤣
The Mini (especially if use the current chassis) has limited I/O. Decoupling I/O from the die isn't going to "buy" much. Squeeze for thermal headroom also so dropping Pref/Watt doesn't help much either.