Another explanation is Apple’s Frankenstein assembly of CPU chips just doesn’t scale beyond two. Routing thousands of high-bandwidth interconnects is beyond the current state of magic.
UltraFusion has 10,000 connections. The trade-off is that it takes up the whole edge of the Max die. If Apple needed another one of those then starts to get into a zero sum game with the RAM controllers (which largely 'eat up' the longer die edges with their longer distance, high bandwidth connection demands. ). At some point run out of die edge space. Can't have too many edge space consumers.
It isn't just a "CPU" chip. The GPU die space of the Max die greatly dominates the CPU space.
www.anandtech.com
[ NOTE: Apple photoshopped out the UltraFusion connector on the initial M1 Max die photos which runs along just about the entire bottom of this picture.
About where the M1 Max label above is the 4 TB controllers. (other PCI-e and display output runs along the top edge above ). ]
The Max is pretty close to being a set of CPU cores wrapped around what would be normally called a GPU die ( GPU cores , video encode/decode multimedia accelerators , and memory controllers to keep the GPU cores fed. ). The CPU cores are being fed a limited slice of what the GPU needs.
If the UltraFusion and LPDDR components are going to compete more for edge space then will need to lower the bandwidth of one of those two. Or kick some of the secondary stuff off ( move TB , PCI-e , SSD , etc) off. (e.g., a smaller UltraFusion like connector (more pads/lanes in a smaller space and then had over the 'saved' edge space so something else (e.g. LPDDR5 controller).
If aggregate too broad a set of external I/O needs onto a single die then probably end up in conflict. Apple throws the entire 'kitchen sink' onto one monolithic die to maximize performance per watt savings. For laptops that can be a good trade-off. For reasonable sized desktops that gets more dubious. Especially, if want to greatly crank up the GPU core count which means need to crank up those LPDDR5 memory controller edge space consumption. ( have to keep massively parallel data consumers fed with massively parallel memory paths). Doubling up on "Poor man's HBM" + UltraFusion at the same time means going to have to let go of absolute max perf/watt at least just a bit. (have to push some other I/O off the die).