Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

scottrichardson

macrumors 6502a
Original poster
Jul 10, 2007
733
311
Ulladulla, NSW Australia
Yep, let's do this. Let's start the speculation of all things M3 generation.

Let's discuss what we believe the M3 generation will bring. From:

- process;
- core count;
- improvements to core architecture;
- improvements to GPU architecture;
- changes to RAM and subsystem;
- cache types/sizes;
- will it support hardware ray-tracing;
- changes to media encoders;
- clock speeds;
- TDP versus performance tradeoffs.

-------
My thoughts to get us started:

------.

If base M3 has the same core count as M2:

I have seen some rumours suggest the M3 base chip will offer the same 8 core / 10 core layout. If that's the case, then I can assume a couple of things:

- higher clock speed by ~10-15%
- smaller die size, given the smaller manufacturing process
- slightly lower power consumption
- allows for slightly thinner/smaller devices

Note, if the GPU cores are to include hardware RT features, then those features may require additional die real estate, which might negate the die size shrinkage.

If Apple makes use of higher silicon density:
Apple might choose to keep the die a similar size, and compensate by adding a little more to the core count. I imagine this would maintain symmetry, so perhaps a

M3: 6p + 4e CPU / 12 Core GPU / 3.85GHz nominal clock speed

One would assume then:

M3 Pro: 14 core (10p + 4e) CPU / 22 Core GPU / 3.85GHz nominal clock speed
M3 Max: 14 core (10p + 4e) CPU / 44 Core GPU / 3.95GHz high power clock speed
M3 Ultra: 28 core (20p + 8e) CPU / 88 Core GPU / 3.95GHz high power clock speed

Memory
All utilising LPDDR5X for RAM, an upgrade over the LPDDR5 currently used. Unlikely to see LPDDR5T, or LPDDR6. Still, LPDDR5X offers a big jump in performance of ~30-50%. 6400MT/s vs 8533MT/s, while utilising 20% less power. see: https://en.wikipedia.org/wiki/LPDDR

It seems that with process node shrinks, it opens up a bunch of choices for the silicon designer. More cores, more complex cores, or smaller die? Higher clock speed? Lower TDP? Focus on any one of those with large improvements, or focus on two or more with smaller improvements.

Keen to hear your thoughts and have fun speculating! I'd love to hear from people with more knowledge in the subsystem and other chip design areas like cache etc.
 
Last edited:
M2: 6p + 4e CPU / 12 Core GPU / 3.85GHz nominal clock speed
You probably meant this to be M3?

Actually, I would think M3 would be 4P + 6E/8E, if they managed (which I think they will) to improve the E-core's IPC. Base level Mx SoC need more power efficiency rather than performance. 4P for base Mx should be more than enough for most basic users' needs.

From various threads I've read, N3 process is not friendly to SRAM/cache designs, as it doesn't give the same space saving as logics. More cores means more cache required to feed them, so I don't think M3 SoCs will grow much in core counts for CPU. GPU and NPU probably will grow more.
 
I doubt Apple will increase CPU cores. Instead, Apple could include more marketable IP like AV1 de/encoder and a ray tracing accelerator. It also makes more sense for Apple to include more GPU cores to address its weakest point (gaming, 3D and machine learning).
 
Last edited:
My guess:

- 3nm (unclear which iteration)
- new, wider CPU P-cores with 15-20% higher IPC + 10-15% higher clock for 25%+ total ST performance improvements
- 4+4 cores for the M3, 8+8 cores for M3 Pro/Max
- new, much faster neural engine with sparse data support
- significant GPU redesign (hardware raytracing, memory system redesign, much faster matrix multiplication)
- 3D chip packaging with separate logic (3nm) and memory (5nm) dies — but that might be just the Pro/Max variants
 
My guess:

- 3nm (unclear which iteration)
- new, wider CPU P-cores with 15-20% higher IPC + 10-15% higher clock for 25%+ total ST performance improvements
- 4+4 cores for the M3, 8+8 cores for M3 Pro/Max
- new, much faster neural engine with sparse data support
- significant GPU redesign (hardware raytracing, memory system redesign, much faster matrix multiplication)
- 3D chip packaging with separate logic (3nm) and memory (5nm) dies — but that might be just the Pro/Max variants

Yeah that looks great. I hadn't speculated much on IPC improvements as that's outside my knowledge and understanding. Be interesting to see if Anandtech do a nice deep dive into the M3 architecture if/when it comes out at the end of the year.

People will be a bit disappointed if ray-tracing hardware isn't included on the GPU with M3. It seems inevitable, but I won't hold out hope until it's real.

You could be right about the 8+8 design for the pro/max. The efficiency cores could become more efficient, AND more powerful, which is a big win across the board for power vs energy.
 
  • Like
Reactions: Cape Dave
Be interesting to see if Anandtech do a nice deep dive into the M3 architecture if/when it comes out at the end of the year.

I doubt it, since Andrei (who was writing the deep dives) has left Anandtech to work for Qualcomm. Basically, everyone doing technical deep dives have left.

People will be a bit disappointed if ray-tracing hardware isn't included on the GPU with M3. It seems inevitable, but I won't hold out hope until it's real.

Given the amount of RT patents Apple has published in the past year I’d be shocked if we don’t see hardware RT. That’d mean that something went really really bad. Most of the stuff I wrote is based off the existing Apple patents btw.

You could be right about the 8+8 design for the pro/max. The efficiency cores could become more efficient, AND more powerful, which is a big win across the board for power vs energy.

That’s following German’s leak, which makes sense to me.
 
Check the max RAM of the M2 Max vs M2 Ultra. They're the same.
No they’re not.
M2max maxes (heh heh) out at 96 GB, while M2ultra goes all the way up to 192 GB.
And if you try to put more than 96 GB with the Max, you get this warning:
“96GB available with M2 Max chip with 38‑core GPU. 128GB and 192GB available with M2 Ultra chip.”
 
No they’re not.
M2max maxes (heh heh) out at 96 GB, while M2ultra goes all the way up to 192 GB.
And if you try to put more than 96 GB with the Max, you get this warning:
“96GB available with M2 Max chip with 38‑core GPU. 128GB and 192GB available with M2 Ultra chip.”
Corrected, thanks

The max RAM is sadly short of the 2019 Mac Pro's 1.5TB. At the rate Apple's doing this it may take a decade or two to reach that amount. All for the sake of economies of scale.
 
  • Like
Reactions: Project Alice
My wish is that the M3 is all about the GPU and we see a full new Apple9 feature set that finally plugs all the missing gaps between Metal and the cutting edge versions of PC APIs.

Full RT hardware with ray sorting, better atomics, additional memory models, device-wide barriers, frame generation hardware.

At this point I'm actually really happy with the CPU performance of M-chips, and I want all the focus to go into making the GPU better (but if they want to make the CPU even faster then I won't say no. :D)
 
My wish is that the M3 is all about the GPU and we see a full new Apple9 feature set that finally plugs all the missing gaps between Metal and the cutting edge versions of PC APIs.

Full RT hardware with ray sorting, better atomics, additional memory models, device-wide barriers, frame generation hardware.

At this point I'm actually really happy with the CPU performance of M-chips, and I want all the focus to go into making the GPU better (but if they want to make the CPU even faster then I won't say no. :D)
With Apple's Game Porting Kit it is a given that there will be efficiency and raw performance improvements in all use cases that require better GPU Core performance.

Performance trajectory's there. It is just that the very vocal minority want it in 2020.
 
Last edited:
M3 is gonna need to focus on CPU and RAM. The RAM count of the M2 Ultra is too low for professionals. I know, 192gb of RAM is overkill as it is, but when you consider audio professionals go above 300gb, it's not enough. M3 Ultra needs to get to at minimum 256gb to convince audio pros to ditch their 2019 Mac Pros and go ARM.
 
Why would that be interesting? I must say, as a developer who is interested in high-performance computing, I am entirely ambivalent towards ARMv9.

It would be nice to get SVE2 and SVE streaming mode, but it doesn’t require ARMv9.
Yeah sve2 is the intersting part to me as well . I mean if they use armv9 they'll be "forced" to have it :)
 
  • Like
Reactions: Basic75
It would be nice to get SVE2 and SVE streaming mode, but it doesn’t require ARMv9.
Aren't these features unique to Armv9? It seems like Apple would need to implement a couple of features that differentiate Armv8.5 and Armv9.0 to add SVE2. But, Apple would need to implement v9.0 and v9.1 and SME from v9.2 to add SVE streaming mode. Isn't that a lot of things at once?
 
Aren't these features unique to Armv9? It seems like Apple would need to implement a couple of features that differentiate Armv8.5 and Armv9.0 to add SVE2. But, Apple would need to implement v9.0 and v9.1 and SME from v9.2 to add SVE streaming mode. Isn't that a lot of things at once?

I thought these were all optional features for ARMv8? Could be wrong though…

Interestingly enough, Apple AMX is very similar to SVE streaming mode. When you look at the reverse-engineered instructions, there is a substantial overlap. It would be great if Apple would stabilize these, either using SVE ISA or even rolling their own.
 
I am not expecting it for M3, but at some point, I believe, they will implement DRU – dynamic resource unicore – which will rely on a wagonwheel geometry of rename arrays interleaved with EU columns, surrounded by thread controllers. Dynamic allocation will allow the unicore to gate off unneeded parts of the wheel so that it could, in theory, run as though it were all E-cores, or activate more of the array to increase performance. One major advantage to DRU is that Apple will be able to make a single unicore for all of its SoCs and simply fuse off part of the wheel for the lower-tier models.
 
I am not expecting it for M3, but at some point, I believe, they will implement DRU – dynamic resource unicore – which will rely on a wagonwheel geometry of rename arrays interleaved with EU columns, surrounded by thread controllers. Dynamic allocation will allow the unicore to gate off unneeded parts of the wheel so that it could, in theory, run as though it were all E-cores, or activate more of the array to increase performance. One major advantage to DRU is that Apple will be able to make a single unicore for all of its SoCs and simply fuse off part of the wheel for the lower-tier models.
Wow that sounds super interesting. Is this something that’s been talked about in other tech papers/blogs anywhere?
 
Wow that sounds super interesting. Is this something that’s been talked about in other tech papers/blogs anywhere?

The only thing I have to go on is IBM POWER10, which has cores that run 8 threads each (vaguely like the way x86 can run two threads on a P-core). I have not seen anything at all about what I described, but Apple is in their own world. From what I know about AS μarchitecture, it seems like it would be a practical approach, and the logic needed to keep all the code streams coherent is probably not a lot heavier than keeping one out-of-order stream coherent.

The big missing element would be the logic that assesses the load requirement and adjusts the capabilities of various threads, which would be a significant undertaking that would initially be under software management as the internal logic of it gets refined.

So, yeah, I am just spitballing.
 
I am hoping Apple makes GPU focus of improvement with RT, Tensor cores and more TFlops. Hope to see 128 GB M3 Max.
 
Aren't these features unique to Armv9? It seems like Apple would need to implement a couple of features that differentiate Armv8.5 and Armv9.0 to add SVE2. But, Apple would need to implement v9.0 and v9.1 and SME from v9.2 to add SVE streaming mode. Isn't that a lot of things at once?


Arms docs are also suggestive that is part of v9 also.


"... This guide is a short introduction to version two of the Scalable Vector Extension (SVE2) for the Armv9-A architecture. In this guide, you can learn about the concept and main features of SVE2, the application domains of SVE2, and how SVE2 compares to SVE and to Neon. We also describe how to develop a program for an SVE2-enabled target. ..."

The confusion stems from there is a high overlap of 8.5A and 9A. That also continues for 8.6A through 8.8A ( 9.1A-9.3A )

" ... In this section of the guide, we summarize the new features that were added in each of the Armv8.x-A and Armv9.x-A extensions. We do not provide a complete list, but we include the most important features. ..."

Not sure Apple has done 8.3A-8.4A nested virtualizations , so getting to v9 could be a leap. ( could be Apple's hypervisor lagging behind. ) [ For example Windows Linux subsystem or some of the enhanced security modes have layered virtualization. If Apple is going to pragmatically ban common native booting 2 layers of virtualization isn't going to be all that super rare. ]


"... Armv8.7-A and Armv9.2-A
  • Enhanced support for PCIe hot plug (AArch64)
..."

Might be helpful also.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.