That's an interesting idea, and makes quite a bit of sense, given that the M3 Pro became its own die, rather than a Max missing some GPU cores and memory controllers. Apple seems to be making enough Apple Silicon that it is worth having more distinct dies.
The M3 Pro is going to show up in
at least three products. That will be substantially more volume SoCs consumed by two products than it would over one. Minimally it will be the MBP 14" , 16" , and Mini Pro . Even if want to handwave the MBP 14"/16" into 'one' product it is still more than one Mac product.
If Apple reused the old iMac chassis they could easily come up with a 4th . Ditto if they Microsoft Surface 9 Pro like 'clone'. The core issues is that there are multiple places a Mx Pro can go in the Mac line up. For whatever deep mysteries Apple never did the Mini Pro with the M1 generation. Probably because Apple underestimated how well the Mini would do once unshackled from the impoverished Intel iGPUs and thermal limits.
If Apple 'forks' the desktop Max+Ultra placement off from the MBP Pro 14/16" then that shrinks the placements of the Max . Even worse if keep the far more affordable Studio variant with the Max coupled to the MBP 14"/16" Max subset and for Ultra/Extreme off by themselves in the $6K+ range of limited volume. Apple has shown no moves toward derivative silicon that goes into
FEWER products than the previous generation. Apple whole grand strategy with Apple designed SoCs has for YEARS been to stuff the SoCs into multiple products in a hand-me down fashion. iPads got iPhone SoCs. AppleTV got iPad Pro SoC for a while. The Homepod Mini is getting Apple Watch SoCs. etc. etc.
What Apple is missing a real chiplet that would all them to span "Max-like" , "Ultra-like" , "more than Ultra Like" using the same base compute die and probably some decoupled I/O die(s) ( practically nobody needs 12-16 Thunderbolt ports or more than one secure elements or four Apple SSD controllers. ). Apple would need something that covered both the Mac Studio and Mac Pro products to even remote to get to volume coverage with just two products.
The catch-22 is whether the MBP 14"/16" Max variants decoupled from the Mac Studio are really still 'two products' worth of volume. If not will get the bundling of the Studio+MP to cover 'two viable products worth' of volume.
There are a couple of design wins with that... One is that unnecessary pieces can be eliminated. The Ultra doesn't need 8 e-cores, and the Extreme certainly doesn't need 16 of them - anything that wants a ton of cores almost certainly also wants P-cores instead of e-cores. It doesn't make sense to eliminate the e-cores entirely - they're very useful for writing e-mail or the like. while waiting for a bigger job to complete, but the Ultra can go down to 4, leaving the Extreme with 8 (which is too many, but better than 16).
That isn't how Apple uses e-cores. Any foreground GUI task is going to soak up a P core ( strong 'push' to make anything feel like it has a 'snappy' response. ) . It is more so non interactive stuff that gets shuttled to the background ( and e cores). Apple really doesn't need a x4 ("Extreme"). 3x would/should work at least as well as the increment from Mx Pro to Mx Max has worked.
The major 'problem' with the M1/M2 Max die is far more so that the I/O focus is optimized for the two sides of a thin MBP chassis. 'who needs more than 4 ports....' . It doesn't scale well in utility at all. Folks may sneer at E-cores but
a: they still scale in work done. May not be hype worthy for extra bragging rights type of scale but they do get more work done. That not having them at all. [ It is pure fantasy that dropping e-cores is going to get more P cores. The die space trade off isn't even close. Dropping e-cores extremely likely isn't going to get more P cores. ]
Another example is that the Ultra doesn't need a duplicated Neural Engine, and the Extreme doesn't need four of them. Very little makes use of the Neural Engine, and the few things that do (largely Apple's own apps) are written to run on exactly one of them - it is consistent all the way from the iPhone to the Max, and the number of Ultras out there with a second one isn't large enough to interest developers.
Apple's AI/ML libraries are not limited to Apple apps in any significant way. Lots of apps use these and as more apps adopt more AI/ML features ( which is the 'feature trend' for at least the next couple of years ). Apple's top end compute is WAY behind top end class AI/ML add-in cards can add elsewhere. Tossing AI/ML performance out window is only going to lead to less Mac sold.
If Apple removed the duplicate Neural Engine and some of the e-cores from a custom-die Ultra, they could either make the die smaller than two Maxes or throw in extra units that do scale - either extra P-cores, extra GPU cores or a mixture of the two.
That likely isn't going to work if there is any remote couple coupling of Apple's AI/ML upscaling to graphics. If have more GPU cores to render higher resolutions but the upscaler isn't also scaling then have a balance problem.
Similar with any coupling between media en/decoding and AI/ML cores.
The second win is that there is no need to design in an additional interposer placement. If the current interposer is on the North-South axis of the chip, a four-way Extreme would also require one on the East-West axis.
if just stop at 3 can keep same "north/south" reusable pattern. Prune off the I/O at the 'top/north' and can attach either the i/O chiplet or another 'compute' chiplet. Decoupling the compute cores doesn't really buy a whole lot. And can make it substantively smaller if stuff that is trying to talk to the outside/off-die world. That is the relatively oversized stuff. ( cache is next after that ). The various compute core logic is shrinking with N3B/N3E. The I/O and cache is relatively not. ( with N3E ... not at all. full regression back the N5 sizes! ). You are going through lots of gyrations to kick off the stuff that isn't getting smaller. Kick the relatively chunky stuff off and will more likely get a smaller die.
The UltraFusion is still off-die communication , but it is not long distance off die. It is incrementally smaller to go vary micro distances than to go inches/centimeters (or more ) away.