Well the problem is that they DO present them as unified resources. But in a NUMA configuration they are not.Somehow I was under the impression that modern operating system kernels presented multiple CPUs as unified resources to their software. Am I thinking of something unique to BeOS?
If CPU0 has 8 memory channels, and CPU1 has 8 memory channels, and CPU0 and CPU1 are connected via some form of interconnect, then the memory of CPU0 is "farther away" for CPU1 than its own memory. If the OS is not aware of that then it will shuffle threads away from the memory their data is located at.
For CPUs with multiple cores this isn't a problem, since they all connect to the same memory subsystem and hence have uniform memory latency. You can even get away with that if you put two CPUs/SoC on a package and have a high bandwidth-low latency interconnect (as M1 Ultra does). You can also alleviate this by going the EPYC route and putting the memory controller into a separate die all CPU dies then connect to, again having uniform memory latency.
But as soon as you have multiple sockets, and hence a physical interconnect of several centimeters of length the latency between sockets will introduce so much latency that a thread running on one CPU, but its data being on memory belonging to the other CPU will perform significantly worse.
This is the reason why an OS has to be NUMA (Non-Uniform-Memory-Access) aware to properly manage threads and memory allocation to reduce this kind of mismanagement. Linux, as an OS predominantly used in servers, which are virtually the only machines that have multiple sockets, is very good at managing this. Windows, for example, is not. And MacOS .... who knows.
Last edited: