I have a new theory, based on M3 rumours. The M3 Ultra is the "Extreme" chip we were hearing about and hopping for in the Asi MP they released a WWDC.
The 40 cores, the 384 GB of RAM, 144 core gpu .. all sound like the 3nm die shrink .. where they can physically fit the equivalent of 4 x M2 max chips into the physical space of 2.. The Ultra. Also 5nm -> 3nm is not half.. which is why its 40 not 48 cores.
The physical space of 2 probably isn't happening.
They would have to linearly increase the number of Memory Packages on the package to go to something with the equivalent of bandwidth of 4x a M2 max bandwidth. TSMC N3 isn't going to do diddly squat to shrink those packages at all. Nor the on-die physical out to that increased package count. (N3 could add some fancier logic to the memory controllers like ECC or compression, but the I/O aspects of the memory controllers talking to 'distant' packages elsewhere isn't going to shrink much at all with the same baseline approach sharing the same fab process as the compute logic and memory on affordable plain 2D interposers for "poor man's" HBM.).
N3 isn't going to double the cache size either. ( N3B doesn't particularly shrink SRAM/cache much at all , and N3E is absolutely no shrink at all. ). Apple Silicon is relatively very cache heavy. Which means they won't see the same aggregate shrink that TSMC's nominal test chip sees. And dramatic core count increases without cache to go with them won't really do much for performance on general workloads. ( will/could 'goose' benchmarks that can largely drag into cache , but probably doesn't scale well with normal mixed , concurrent app workloads. )
Apple could keep two dies joined on a single edge if grew the building block much bigger. That won't let you keep the same size. The more affordable InFO-LSI packaging process it likely out the window regardless of how you 'slice' the problem. So a 800mm^2 block isn't necessarily a roadblock for CoWoS-LSI. It just costs substantially more.
The "Max" , as composed to fit into laptops, really isn't a good chiplet design if want to scale past 2. If Apple is going to "40 cores" there is pretty good chance the Extreme will being using chiplets with different shape(s) than the Max. So the laptop Max die as a unit of space measure probably would be a good fit.
If Apple goes to chiplets and the extreme's CPU core count slides 'backward' on core count then the Ultra probably would also. Economically, they probably would need to use the same chiplets for it also.
This would possibly mean that the M3 Max is gonna be equivalent to the currently M2 Ultra, possibly even a touch faster.
Pretty good chance that the laptop Max is decoupled from the desktop SoCs. Again, probably not a good baseline measurement metric as will be different dies.
That is gonna finally be a 3d capable ASi system.
The current systems can't do 3d now? "capable" isn't the same as 'improved performance".
This may also explain the lack of 3rd party GPU support - i'm making excuses for Apple here.. but it might be that a real 3d capable igpu is about 15 months away.
Apple got a general 20% upllift from M1 Ultra to M2 Ultra with the exact same memory subsystem and using an about three year old fab process for M2. So if they go to an incrementally faster LPDDR5 memory subsystem and N3 ... yeah it is a pretty safe bet that they will get another substantial uplift. Even if it was just a shrink of what they got (and same core count) it would likely go significantly faster.
And even if keep the core count number about the same can make "bigger transistor budget" cores so GPUs cores could get some limited HW raytrace add-ons ... yeah that would be faster also in that niche. (that niche doesn't enumerate '3d' ).
But Apple has already tossed dGPUs from the rest of the line up. laptops -> old 'iMac 27" zone. It isn't primarily about the Mac Pro. If the M3 generation brings another 15-20% uplift to GPUs then dGPUs are just even
more dead there. M4 probably can squeeze out slightly bigger dies and get some straightforward GPUs gains. M5 probably a fab augment to leverage. Rinse and repeat on M6. The progressively morosely "dead" dGPUs are in the rest of the line up makes 3rd party GPUs more and more problematical for the Mac Pro. Likely not jus what Apple can already see in the test labs with M3 that is at issue here. It is a decently long, achievable path into the future.
Making applications that optimally runs on those products is what pays the "Mac ecosystem" bills. That is the 'equation' that Apple is balancing.
The other equation that Apple is balancing is power. Three 8-pin AUX power sockets disappeared. A large majority of that is likely being reserved for a much bigger SoC package that soaks up substantially more power than the M2 Ultra does. The Mac Pro is at standard (USA) household wall socket power levels. If the power consumption of the "CPU" thermal zone goes up significantly then the other zones are going to need to give up. Fixed cap out of the wall makes it a zero sum game.