It's not that it's naive, so much as you have to realize that the devil is in the details. For example, it's easy to say that the interconnect joins the chip networks, but *how* physically does it do that? (It's not like it's an Ethernet and you can just throw a switch in between the two... though that is one conceptual approach you can take.)
It's true that a NoC (network-on-chip) is quite different from ethernet. However, general networking concepts still apply. You
can just throw in something broadly similar to an ethernet switch. Even a single die couldn't function at all without switches.
For M1 Ultra, Apple's die-to-die interconnect solution appears to be simple brute force. They claim that ~10000 interconnect signals provide 2.5 TB/s bandwidth. 2.5e12 * 8bits/byte = 20e12 bits per second. 20e12 bps/10e3 wires = 2e9 bits/s per wire. That 2 GHz frequency plus the extremely short reach of the interconnect suggests that Apple didn't bother with SERDES. Instead, they're taking advantage of having so many signals by just running their on-die NoC protocol through the wires connecting the die together, and probably at the standard on-die NoC frequency.
If there's really no frequency change, it may be as simple as a flop just before each off-die transmitter pin and a flop just after each off-die receiver pin, adding 2 cycles of latency when crossing die. Even if it's slightly more complex than that, this is how they were able to claim software can treat M1 Ultra as if it's just one giant SoC. Technically it's a NUMA, but the non-uniformity is so mild that software can safely ignore it.
And once you've solved that problem for an Ultra, two back-to-back Max chips, does that solution scale at all? It does not, not without at least some additional work, because you can't lay out more than two chips all with direct links to each other of that scale. Thus is born the "I/O die"... maybe.
Right, if you need longer reach (more physical distance between die) and/or more die-to-die links, you'll eventually be forced to go with SERDES to get more bandwidth out of each off-chip wire. SERDES always add latency. There's well-known ways of mitigating this; if you want an example, look up what Intel has disclosed about its CSI/QPI/UPI interconnect for multi-socket systems. Intel has to use SERDES there because their comm links are off-package and potentially even off-board, not just off-die, so there are a few orders of magnitude fewer signals than 10K, several orders of magnitude longer distance, and connectors. They also have to support all kinds of other stuff needed to scale up to hundreds or thousands of nodes.
What interests me is how big a system-on-package can Apple build with these extremely short reach edge-to-edge links they've demonstrated with M1 Ultra. You can't build an arbitrarily big system this way, it can't possibly scale out to huge systems like Intel's interconnect does, but Apple doesn't need that. They only need enough scaling to cover the existing Mac workstation market.