1. Physics, with a SOC like the M series the ram is built into the CPU and GPU so data has to travel much less distance significantly reducing latency and signal degradation
2. Wider memory buses for higher bandwidth, versus narrower interfaces in socked/upgradable setups limited by pin counts and traces.
3. Optimized, proprietary interfaces without socket overhead like inductance or standardized protocols that constrain speeds in upgradeable RAM setups.
Its like a normal engine that is designed for public use and therefore has to support a wide variety of fuels vs a highly tuned engine optimized to a specific fuel (race gas, E85, etc) it can squeeze out way more power but then you are stuck with whatever it is tuned for.
The speed differences are why GPUs have never been able to use the socketed RAM. Its just way too slow for GPU purposes, but with a SOC like the M series the GPU can use the same RAM as the CPU allowing the studio for example to have access to 512 GB of video ram (- some RAM for running the system etc) allowing it to run huge AI models that would require extremely expensive and power hungry GPUs running together to even compete (like 10s of thousands of dollars worth of GPUs)