432mm² to 475mm² is not chunky.
Broadcomm have combined chips that are much larger with TSMCs CoWos interposer for a 1700mm² monstrosity at twice the reticle limit with current equipment. Perhaps Apple was trying to combine bigger chips for the Mac Pro?
Apple didn't use CoWos on the Ultra. InFO-LSI is/was at the reticle limit. So having two 400+ mm2 is relatively chunky unless switch tech. Yes, Apple would likely need something much better for the "4 Max" set up ( if it even worked due to the awkware memory lay out) , but the package roll out is different.
500-600 is pretty much gonig to limit the scaling to just two even with 17000mm2 space. if can't scale then the chiplets are chunky.
Broadcomm is replicating what disaggregation on that package? . Or pulling the HBM RAM onto the package ( RAM stacks are not 'chiplets'). Two SoCs run in fault tolerant parallel mode on a package each with their own RAM store isn't particularly 'chiplets' either.
There was an Intel mega package that put two server dies in one package.
We have details on the Intel Xeon Platinum 9200 series with up to 56 cores and 12 channel DDR4-2933 per socket or 224 cores per U of rack space
www.servethehome.com
The Xeon SP gen 4 ( Shappire Rapids ) uses a largish, chunky dies
"... With a multi-die design, you can ultimately end up with more silicon than a monolithic design can provide – the reticle (manufacturing) limit for a single silicon die is ~700-800 mm2, and with a multi-die processor several smaller silicon dies can be put together, easily pushing over 1000mm2. Intel has stated that each of its silicon tiles are ~400 mm2, creating a total around ~1600mm2. But the major downside to multi-die designs is connectivity and power. ...
"
www.anandtech.com
there are accelerators on each tile/chiplets. If you are using the accelerators all the time that's fine. If not using the accelerators at all then each tile brings 'deadweight' die area utilization. That not a particularly good chiplet design from that perspective. Part of 'chucky" is how much 'dead weight' are you bringing.
8 Thunderbolt controllers is too many in the two die max context ( two secure elements and redundant SSD controllers is likely too many also). If try to scale that up to four it is a substantive waste of space.
Multiple Chip packaging existed before chiplets. It is not exactly the same thing. Everything that uses 3D ( or LSI 2.5D) packaging is automatically being tagged as chiplets is a bit of over reach.
Regarding memory bandwidth I don't think the M2 Max SoC is going to be limited by it anyway, so no reason to change it when they have the supply chain locked down for the current LPDDR5-6400.
It is limited somewhat because Apple boosted the size of the system cache. (mentioned in the overview). Which is contributing to the size increase both now at TSMC N5 and later on if they stick with system cache on main computational die. Not going to shrink any time soon even with new next gen fab processes.
But yes. die component costs play a role in the design choices Apple is making with the M-series SoCs. The latest, greatest won't always make the cut.