Well, one can say that the M1/M2 Pro & Max is already "micro"-NUMAish, seeing that they have 2/4 memory controllers independently controlling 2/4 different banks of LPDDR memories providing data to the various IP cores.
I am 95% sure that's wrong. AFAIK, there is no meaningful difference in latency to any of the memory controllers from any of the cores, so it is in no way NUMA. Now, on the other hand, as I pointed out, the situation's almost certainly different for the Ultra. I believe there is a latency penalty for accessing RAM across the "ultrafusion" link. (If anyone knows for sure one way or the other please say so, with a cite!)
Splitting the GPU cores from the rest of the IP cores into different dies does not make it any different from putting them in the same die, other than accounting for more signal propagation delays between the dies.
Indeed it does make it different! That propagation delay (which isn't just a propagation delay, as more logic is involved, not just wires) makes half the system RAM have a latency greater than the other half, for any given core. That is the definition of NUMA.
What is important to Apple, IMHO, is that macOS does not need a NUMA overhaul.
That's debatable. Or at least, it needs more data. But there's no question that making it NUMA-aware would wrest more performance out of the system, and Apple is relentless in their pursuit of that kind of efficiency. Of course, the OS crew will know this is coming (if it is) and have some time to work on it before it ships.
When you think about it. 3nm M3 series chips will likely have a bunch more cores given the extra die space available within the same power constraints as everything will be smaller. So it’s fair to say that an M3 Max could very well end up having the core count of an M1 Ultra, all while using the same power as an M2 Max.
No, that can't happen. I agree that there will likely be more cores, but N3B (or any N3) is not going to buy you double the cores. It's not that much smaller. And in particular, N3B saves almost no space over N5x for static RAM, which is a major part of each core cluster (as well as the SLC).
Now, what kind of cores will they add? That's a very interesting question! CPU cores are quite cache-hungry - each cluster wants a big L2$ - whereas GPU cores are not. (Although, they might add to pressure to expand the SLC, which would eat a lot of area.) So while I'd really love to see another cluster or two of CPUs, my guess is that we will likely get one, and possibly zero more CPU clusters (so, 4 or 0 more cores), and they will max out the GPU. And probably bump up the NPU as well. Plus random other stuff - more AMX, AV1 support in the encoding/decoding block, etc.
Also remember that while area shrinks a lot on N3B, power use goes down a lot less. It will be interesting to see what they decide to do - use the area for something less intensive? More app-specific accelerators? Or just make a smaller chip?
This does bring up one very interesting possibility. Intel has shown that it can pack a ton of E cores into even laptop chips. There's no reason Apple couldn't, if they wanted to. But the question is, how useful are E cores en masse, outside of a couple of very specific benchmarks? Apple already is facing scaling issues - but they are hopefully well on their way to dealing with that in the M3. So one possibility would be to add another cluster, or even two, of E cores. That would be great for anyone with massively parallel code... and relatively useless for anyone else. But if they have the area to spare and they can't pack it with high-energy-use cores, they might just do that. I could imagine a 12P+12E chip quite easily.
Which node could Apple use?
View attachment 2155322
Shrinking finally costs more, Moore’s Law is now dead in economic termsA couple of weeks ago, we were able to attend IEDM, where TSMC presented many details about their N3B and N3E, 3nm class…
www.semianalysis.com
It's basically an open secret that Apple is the only big (or maybe only-only) customer for TSMC's N3B, which is what's in production *right now today*. N3B is not suitable for phones, it seems, so the phones (Pro line, anyway, the one getting the A17) will go on N3E. That leaves what exactly for all those N3B chips? Most likely the new M3 family. My guess is that we will see M3x Mac Pros by WWDC, probably sooner, and probably new iMac Pros or iMacs as well. Possibly, but less likely, new lower end machines as well, but that would be tough in terms of marketing. Also, perhaps too costly - I would more expect Apple to do basic M3 chips on N3E too.
It's possible, however, that Apple will sit on the N3B chips (first, probably big M3x) for longer.
I think Apple will keep the same number of cores as the M2 series. They already bumped up the core count for Pro/Max which will lead to a bump for the Ultra as well.
I could see Apple significantly increase the performance of the cores in one generation, increase core count in the next generation, and repeat. This will stagger the performance increases.
I could also see the base M3 getting 6/6 though I think it'll likely be something like 6/4.
If my expectation is correct, the first M3 will be a huge multichip package for the Pro. But eventually we'll see a regular M3, on N3E, and it will be interesting to see what that looks like - all the factors I mention above apply here as well. The M3 will likely have a significant IPC boost, and probably a modest clock boost as well. They could easily leave it as 4/4, but I think 6/4 or 8/4 is more likely (6/4 only if they change the clusters to 3x from 4x). But the same outside chance of lots of E cores is possible here too - say, 4/8 or even 4/12.