More evidence that Apple plans to update Apple Silicon yearly

sam_dean · Feb 8, 2023

scottrichardson said:
When you think about it. 3nm M3 series chips will likely have a bunch more cores given the extra die space available within the same power constraints as everything will be smaller. So it’s fair to say that an M3 Max could very well end up having the core count of an M1 Ultra, all while using the same power as an M2 Max.

And just for fun, here’s my M3 series chip speculation:

M3 - 12 core, 3.78Ghz
6p cores, 6e cores / 16GPU cores / 32GB max RAM

M3 Pro - 16 core, 3.78Ghz
10p cores, 6e cores / 24GPU cores w. ray Tracing / 48GB max RAM

M3 Max - 16 core, 3.89Ghz
10p cores, 6e cores / 48GPU cores w. ray Tracing / 144GB max RAM

M3 Ultra - 32 core 3.98Ghz
20p cores, 12e cores / 96GPU cores w. Ray tracing / 288GB max RAM

Apple also has the option to adjust it to be less aggressive in transistor count, core counts, engine counts and power input for aggressive raw performance improvement to run at a slightly smaller die & slightly lower power consumption.

This is why this rumor makes so much sense.

Xiao_Xi · Feb 8, 2023

scottrichardson said:
3nm M3 series chips will likely have a bunch more cores given the extra die space available within the same power constraints as everything will be smaller.

Which node could Apple use?

TSMC’s 3nm Conundrum, Does It Even Make Sense? – N3 & N3E Process Technology & Cost Detailed

Shrinking finally costs more, Moore's Law is now dead in economic terms

www.semianalysis.com

quarkysg · Feb 8, 2023

Xiao_Xi said:
Why not compare the efficiency between Nvidia's RTX 40 and M2 Max/Pro? Do you expect M2 Ultra to be more efficient than M2 Max?

I would think it would be extremely difficult to compare between them, as they have different core speed, core numbers and consuming different energy achieving different TFLOPS values. Furthermore, most test likely favour nVidia's GPUs seeing that they have software being optimised for their GPUs for much longer.

sam_dean · Feb 8, 2023

Xiao_Xi said:
Which node could Apple use?
View attachment 2155322

TSMC’s 3nm Conundrum, Does It Even Make Sense? – N3 & N3E Process Technology & Cost Detailed

Shrinking finally costs more, Moore's Law is now dead in economic terms

www.semianalysis.com

I'll hold off any speculations to 2024 when all the online forums settle on what actually occured or else we'd have people claiming SoC's iGPU can never have dGPU-level raw performance because it has never been done before not for technical reason but because that's now how the parts business is done.

Xiao_Xi · Feb 8, 2023

quarkysg said:
I would think it would be extremely difficult to compare between them, as they have different core speed, core numbers and consuming different energy achieving different TFLOPS values. Furthermore, most test likely favour nVidia's GPUs seeing that they have software being optimised for their GPUs for much longer.

Although I have not found numbers for a comparison between GPUs, Luke Miani has done a general comparison between a laptop with RTX 4090 and the best MacBook Pro.

sam_dean · Feb 8, 2023

Xiao_Xi said:
Although I have not found numbers for a comparison between GPUs, Luke Miani has done a general comparison between a laptop with RTX 4090 and the best MacBook Pro.

Compare a laptop that comes standard with a 96W charger that has an option for 140W charger for fast charging... with up to 22 hrs of battery life

vs

A laptop with a 330W charger...

No contests... raw performance it will be crushed.

That's why I do not bother looking at such comparisons until a M2 Ultra 24-core CPU is released ideally on a MBP 16" with 240W charger.

I looked at Apple.com and this is what chargers shipped on what MBP

- MBP 14" M2 Max top-end uses a 96W charger
- MBP 16" M2 Pro low-end uses a 140W charger

M1 Max vs M1 Ultra is extra 100W CPU max power consumption on the 2022 Mac Studio.

So odds are M2 Ultra 24-core CPU on a MBP 16" will require a 240W charger.

240W Apple laptop vs 330W Intel/Nvidia laptop... who'd win in

- raw performance per app, benchmark, workflow or use case
- performance per watt
- power consumption
- waste heat in BTU/hr
- battery life
- laptop weight
- laptop dimension
- MSRP
- operational noise at idle, sleep and at full load

Mac Studio is that thick and heavy because user requirements want silent PCs.

How about for a 240W Apple laptop? Is silent PCs still a requirement for a mobile workstation with 24-core CPU?

senttoschool · Feb 8, 2023

scottrichardson said:
When you think about it. 3nm M3 series chips will likely have a bunch more cores given the extra die space available within the same power constraints as everything will be smaller. So it’s fair to say that an M3 Max could very well end up having the core count of an M1 Ultra, all while using the same power as an M2 Max.

And just for fun, here’s my M3 series chip speculation:

M3 - 12 core, 3.78Ghz
6p cores, 6e cores / 16GPU cores / 32GB max RAM

M3 Pro - 16 core, 3.78Ghz
10p cores, 6e cores / 24GPU cores w. ray Tracing / 48GB max RAM

M3 Max - 16 core, 3.89Ghz
10p cores, 6e cores / 48GPU cores w. ray Tracing / 144GB max RAM

M3 Ultra - 32 core 3.98Ghz
20p cores, 12e cores / 96GPU cores w. Ray tracing / 288GB max RAM

I think Apple will keep the same number of cores as the M2 series. They already bumped up the core count for Pro/Max which will lead to a bump for the Ultra as well.

I could see Apple significantly increase the performance of the cores in one generation, increase core count in the next generation, and repeat. This will stagger the performance increases.

I could also see the base M3 getting 6/6 though I think it'll likely be something like 6/4.

sam_dean · Feb 8, 2023

senttoschool said:
I think Apple will keep the same number of cores as the M2 series. They already bumped up the core count for Pro/Max which will lead to a bump for the Ultra as well.

I could see Apple significantly increase the performance of the cores in one generation, increase core count in the next generation, and repeat. This will stagger the performance increases.

I could also see the base M3 getting 6/6 though I think it'll likely be something like 6/4.

Each generation of chip will have more core counts unless it is for market condition reasons.

Like Apple and AMD will be forced to come out with a 24-core CPU laptop chip. Both companies are better positioned than Intel to come out with a better part.

Apple is essentially the 3rd player in the desktop/laptop market. When Qualcomm's NUVIA comes out with Windows 11 ARM-based laptops then they become the 4th player in the laptop market then maybe within 1-2 years later they will offer desktops. It is reported that Qualcomm Snapdragon 8cx Gen 4 new leak points to 3 GHz speeds across all CPU cores and TSMC N4 node. So they are a node between Apple & AMD. Trouble for AMD, Intel and Nvidia.

If you are wondering what core counts that M3 will be facing would be facing from Qualcomm's NUVIA then it is a 12-core laptop. This is more of a problem for Intel/AMD than for Apple.

Before 2040 I see this happening

- Windows & macOS remain the dominant desktop OS but worldwide annual shipments have not exceeded '20s high
- ARM laptops & desktops will occupy ~80% of all PCs shipped worldwide
- x86 laptops & desktops will cater to legacy software/hardware
- AMD will remain AMD
- Intel will cede in market share to ARM
- Intel will start offering chips using 2 Angstrom (A2) process node
- Workstation desktops will be as popular as mainframes of today
- Apple still pushes 8GB RAM & 256GB SSD for the future replacements of M1/M2/M3/M4/M5/etc
- OLCP advocates will tell me to keep my 2023 iMac M2 Pro 5nm beyond 2040 even when patched 2033 macOS new features are missing, crashes or too slow

Performance per watt may come to a point where in future smartphones may also include a desktop GUI so that when you dock your smartphone onto a Thunderbolt 5 80Gbps I/O dock you can have a full Windows 11 desktop experience with keyboard, mouse/trackpad & display.

senttoschool · Feb 8, 2023

sam_dean said:
Each generation of chip will have more core counts unless it is for market condition reasons.

Do you have a source for this?

M2 has 4/4 like M1. M2 Pro/Max only has 2 more efficient cores. A12 has 6 cores. A13 has 6 cores. A14 has 6 cores. A15 has 6 cores. A16 has 6 cores. AMD hasn't increased core counts in its consumer lineup since 2019, nearly four years ago. Zen4 has the same number of cores as Zen2 in their consumer lineup. For example, AMD 3950x (Zen2) has 16 cores. AMD 5950x (Zen3) has 16 cores. AMD 7950x(Zen4) has 16 cores.

sam_dean said:
If you are wondering what core counts that M3 will be facing would be facing from Qualcomm's NUVIA then it is a 12-core laptop. This is more of a problem for Intel/AMD than for Apple.

I'm not wondering. See my thoughts here: https://forums.macrumors.com/thread...omms-8cx-gen-4-based-on-nuvia-design.2377972/

sam_dean · Feb 8, 2023

senttoschool said:
Do you have a source for this?

M2 has 4/4 like M1. M2 Pro/Max only has 2 more efficient cores. A12 has 6 cores. A13 has 6 cores. A14 has 6 cores. A15 has 6 cores. A16 has 6 cores. AMD hasn't increased core counts in its consumer lineup since 2019, nearly four years ago. Zen4 has the same number of cores as Zen2 in their consumer lineup. For example, AMD 3950x (Zen2) has 16 cores. AMD 5950x (Zen3) has 16 cores. AMD 7950x(Zen4) has 16 cores.

One chip from Qualcomm's 2024 NUVIA roadmap that is inferred as being positioned to take on M1/M2 of 2024. I expect M3 to be released by then unless Apple aligns their supply chain for annual refresh of Mac chips

- https://www.notebookcheck.net/Qualc...-hybrid-design-and-dGPU-support.666829.0.html
- https://www.windowscentral.com/hard...-12-cores-and-extremely-promising-performance
- https://www.xda-developers.com/qualcomm-12-core-chip-windows-pc/
- https://appuals.com/qualcomm-nuvia-based-soc/

It competes more with Windows laptops and switchers business.

senttoschool · Feb 8, 2023

sam_dean said:
One chip from Qualcomm's 2024 NUVIA roadmap that is inferred as being positioned to take on M1/M2 of 2024. I expect M3 to be released by then unless Apple aligns their supply chain for annual refresh of Mac chips

- https://www.notebookcheck.net/Qualc...-hybrid-design-and-dGPU-support.666829.0.html
- https://www.windowscentral.com/hard...-12-cores-and-extremely-promising-performance
- https://www.xda-developers.com/qualcomm-12-core-chip-windows-pc/
- https://appuals.com/qualcomm-nuvia-based-soc/

It competes more with Windows laptops and switchers business.

Again, see my thoughts here.

https://forums.macrumors.com/threads/apple-silicon-competitor-qualcomms-8cx-gen-4-based-on-nuvia-design.2377972/

sam_dean said:
Each generation of chip will have more core counts unless it is for market condition reasons.

Do you have a source for this?

sam_dean · Feb 8, 2023

senttoschool said:
Again, see my thoughts here.

https://forums.macrumors.com/threads/apple-silicon-competitor-qualcomms-8cx-gen-4-based-on-nuvia-design.2377972/

I responded Jan 24, 2023.

Assuming NUVIA will go for the same price bracket as Macs then I feel bad for Intel/AMD/Nvidia more than Apple.

I am unsure if NUVIA will go sub-$999 laptops or sub-$599 desktops BYODKM. Margins may be below Qualcomm's liking unless they can lower overall cost below Intel/AMD.

senttoschool said:
Do you have a source for this?

Qualcomm and Nuvia's 12-core laptop processor to debut in 2024 with a hybrid design and dGPU support

Qualcomm's upcoming laptop processor codenamed Hamoa is slated for a 2024 launch. It will feature eight performance and four efficiency cores and support dGPUs.

www.notebookcheck.net

Confused-User · Feb 8, 2023

quarkysg said:
Well, one can say that the M1/M2 Pro & Max is already "micro"-NUMAish, seeing that they have 2/4 memory controllers independently controlling 2/4 different banks of LPDDR memories providing data to the various IP cores.

I am 95% sure that's wrong. AFAIK, there is no meaningful difference in latency to any of the memory controllers from any of the cores, so it is in no way NUMA. Now, on the other hand, as I pointed out, the situation's almost certainly different for the Ultra. I believe there is a latency penalty for accessing RAM across the "ultrafusion" link. (If anyone knows for sure one way or the other please say so, with a cite!)

Splitting the GPU cores from the rest of the IP cores into different dies does not make it any different from putting them in the same die, other than accounting for more signal propagation delays between the dies.

Indeed it does make it different! That propagation delay (which isn't just a propagation delay, as more logic is involved, not just wires) makes half the system RAM have a latency greater than the other half, for any given core. That is the definition of NUMA.

What is important to Apple, IMHO, is that macOS does not need a NUMA overhaul.

That's debatable. Or at least, it needs more data. But there's no question that making it NUMA-aware would wrest more performance out of the system, and Apple is relentless in their pursuit of that kind of efficiency. Of course, the OS crew will know this is coming (if it is) and have some time to work on it before it ships.

scottrichardson said:
When you think about it. 3nm M3 series chips will likely have a bunch more cores given the extra die space available within the same power constraints as everything will be smaller. So it’s fair to say that an M3 Max could very well end up having the core count of an M1 Ultra, all while using the same power as an M2 Max.

No, that can't happen. I agree that there will likely be more cores, but N3B (or any N3) is not going to buy you double the cores. It's not that much smaller. And in particular, N3B saves almost no space over N5x for static RAM, which is a major part of each core cluster (as well as the SLC).

Now, what kind of cores will they add? That's a very interesting question! CPU cores are quite cache-hungry - each cluster wants a big L2$ - whereas GPU cores are not. (Although, they might add to pressure to expand the SLC, which would eat a lot of area.) So while I'd really love to see another cluster or two of CPUs, my guess is that we will likely get one, and possibly zero more CPU clusters (so, 4 or 0 more cores), and they will max out the GPU. And probably bump up the NPU as well. Plus random other stuff - more AMX, AV1 support in the encoding/decoding block, etc.

Also remember that while area shrinks a lot on N3B, power use goes down a lot less. It will be interesting to see what they decide to do - use the area for something less intensive? More app-specific accelerators? Or just make a smaller chip?

This does bring up one very interesting possibility. Intel has shown that it can pack a ton of E cores into even laptop chips. There's no reason Apple couldn't, if they wanted to. But the question is, how useful are E cores en masse, outside of a couple of very specific benchmarks? Apple already is facing scaling issues - but they are hopefully well on their way to dealing with that in the M3. So one possibility would be to add another cluster, or even two, of E cores. That would be great for anyone with massively parallel code... and relatively useless for anyone else. But if they have the area to spare and they can't pack it with high-energy-use cores, they might just do that. I could imagine a 12P+12E chip quite easily.

Xiao_Xi said:
Which node could Apple use?
View attachment 2155322

TSMC’s 3nm Conundrum, Does It Even Make Sense? – N3 & N3E Process Technology & Cost Detailed

Shrinking finally costs more, Moore's Law is now dead in economic terms

www.semianalysis.com

It's basically an open secret that Apple is the only big (or maybe only-only) customer for TSMC's N3B, which is what's in production *right now today*. N3B is not suitable for phones, it seems, so the phones (Pro line, anyway, the one getting the A17) will go on N3E. That leaves what exactly for all those N3B chips? Most likely the new M3 family. My guess is that we will see M3x Mac Pros by WWDC, probably sooner, and probably new iMac Pros or iMacs as well. Possibly, but less likely, new lower end machines as well, but that would be tough in terms of marketing. Also, perhaps too costly - I would more expect Apple to do basic M3 chips on N3E too.

It's possible, however, that Apple will sit on the N3B chips (first, probably big M3x) for longer.

senttoschool said:
I think Apple will keep the same number of cores as the M2 series. They already bumped up the core count for Pro/Max which will lead to a bump for the Ultra as well.

I could see Apple significantly increase the performance of the cores in one generation, increase core count in the next generation, and repeat. This will stagger the performance increases.

I could also see the base M3 getting 6/6 though I think it'll likely be something like 6/4.

If my expectation is correct, the first M3 will be a huge multichip package for the Pro. But eventually we'll see a regular M3, on N3E, and it will be interesting to see what that looks like - all the factors I mention above apply here as well. The M3 will likely have a significant IPC boost, and probably a modest clock boost as well. They could easily leave it as 4/4, but I think 6/4 or 8/4 is more likely (6/4 only if they change the clusters to 3x from 4x). But the same outside chance of lots of E cores is possible here too - say, 4/8 or even 4/12.

leman · Feb 8, 2023

sam_dean said:
Why compare a M2 Max 12 core CPU to a Intel Core i9 24 core CPU?

Because that’s the choice you have today and they compete in the same market? What else do you want to compare it to?

quarkysg said:
Well, one can say that the M1/M2 Pro & Max is already "micro"-NUMAish, seeing that they have 2/4 memory controllers independently controlling 2/4 different banks of LPDDR memories providing data to the various IP cores.

Same argument can be made for any chip that uses multiple memory controllers, or in fact any modern chip in general, since the communication latency between different functional nodes is non-uniform. In fact, any data access is non-uniform, depending whether you hit a cache or not. You are absolutely correct that this kind of NUMA is less relevant here. If making the network bigger adds another 10ns average latency, that’s doesn’t really matter if your RAM latency is already over 120ns.

sam_dean said:
Each generation of chip will have more core counts unless it is for market condition reasons.

Like Apple and AMD will be forced to come out with a 24-core CPU laptop chip. Both companies are better positioned than Intel to come out with a better part.

I don’t understand this. Why would this be the case? The only reason why Intel needs to make 24 core CPUs is because their P-cores are too big and too energy-inefficient. So they add 16 slower area-efficient cores to boost the multi-core benchmark scores by 50-60%. AMD doesn’t need it because their main core already uses less power than Intel’s E-core for better performance. Same for Apple. I mean, Apples 8+4 M2 Pro already outperforms Intels Raptor Lake-H. And HX-series with their 160 watts power consumptions is hardly a laptop CPU anyway.

Confused-User · Feb 8, 2023

sam_dean said:
Before 2040 I see this happening[...]

Lol.

That there is a massive failure of imagination. You seriously think you have *any* idea what the world will look like in 17 years? Tech moves a lot faster than the world in general (dragging it along, in fact). I could name ten things that are likely to completely derail those predictions, but that would just be a failure of imagination on my part, because most likely the biggest things in 2040 are not yet visible to us. ...although, maybe, it's not a tough call to say the endlessly ramifying AI explosion will be one of them.

leman · Feb 8, 2023

Confused-User said:
I am 95% sure that's wrong. AFAIK, there is no difference in latency to any of the memory controllers from any of the cores, so it is in no way NUMA. Now, on the other hand, as I pointed out, the situation's almost certainly different for the Ultra. I believe there is a latency penalty for accessing RAM across the "ultrafusion" link. (If anyone knows for sure one way or the other please say so, with a cite!)

Why would you think so? Wouldn’t that entirely depend on their chip network topology? If it’s a ring the latency to different controllers will be different. Isn’t the average/amortized latency more interesting measure anyway?

And I doubt there is a latency penalty for UltraFusion itself, it’s just a bridge that connects two chip networks into a single one. The amortized latency will obviously increase, but that’s be a use the network got bigger. In the related patents I saw Apple claims that there is no or very little latency cost.

Confused-User said:
Indeed it does make it different! That propagation delay (which isn't just a propagation delay, as more logic is involved, not just wires) makes half the system RAM have a latency greater than the other half, for any given core. That is the definition of NUMA.

I doubt it will have much practical impact since memory access will be amortized across different chips and the additional latency due to a longer data path is negligible relative to the DRAM access cost. Unfortunately, I am not aware of anyone who tested DRAM latency for the Ultra. I’d expect it to be in the ballpark of 140-150ns, slightly slower than Max due to the bigger network.

The bigger problem is power, since routing signals further costs more energy. But there are also patents that deal with this: by progressively reducing the size of the data package the closer it gets to destination (dropping resolved address bits) and copying RAM to a local controller if the activity is limited to only a few processing clusters.

Confused-User said:
That's debatable. Or at least, it needs more data. But there's no question that making it NUMA-aware would wrest more performance out of the system, and Apple is relentless in their pursuit of that kind of efficiency. Of course, the OS crew will know this is coming (if it is) and have some time to work on it before it ships.

Apple scheduler is already kind of NUMA-aware. It attempts to schedule related threads to run on the same CPU cluster, maximizing L2 reuse and optimizing energy usage. That’s probably the most important optimization. The other thing is moving RAM contents to a local memory controller, but that’s more of a power usage optimization.

Joe Dohn · Feb 8, 2023

My concern with yearly upgrades is that Apple may forget to add meaningful changes to their devices. It's very nice they focused on hardware, but I feel the software is stagnant.

salamanderjuice · Feb 8, 2023

257Loner said:
I think you're right. I can't believe Nvidia thinks the market is moving towards $1500 GPUs! For those who buy their own dedicated GPUs, Jay Langevin from JayzTwoCents argues that most people are buying $200-$300 graphics cards. But as you said, Apple thinks iGPUs are the future. And there's good reason.

Apple is the largest gaming console maker in America today. In 2021, Sony sold 17 million PlayStation 5’s, Microsoft sold 9 million Xboxes, and Nintendo sold 8 million Switches. That same year, Apple shipped more than 240 million iPhones, 7 times more than the other console makers combined.

Apple’s technological strategy has been to use iGPUs on their phones’ SoCs, along with Metal 3’s graphical features, upscaling among them. Apple has extended this strategy to their other computers, both to their laptops and to their small form factor desktop computers.

With Apple having found so much success in the console gaming market and iGPUs, it is unlikely they will devote R&D to dedicated GPUs and traditional high-end hardware.

Except you know, those aren't game consoles but phones that the majority of users will never crack open a game let alone one that needs hardware more advanced than a PS2.

In 2021 there was over 300 million PCs shipped. By your logic it's actually MS who is the largest console maker.

Joe Dohn · Feb 8, 2023

salamanderjuice said:
In 2021 there was over 300 million PCs shipped. By your logic it's actually MS who is the largest console maker.

Sush! That's not good for Apuru PR!

bcortens · Feb 8, 2023

sam_dean said:
Who ever would have thought a M1/M2 laptop chip being placed in a tablet? I, like many here, thought it was overkill. Then I remembered economies of scale and excess yield so there is an abundance of chips.

Mac Studio HSF takes up half the volume of the desktop because users demanded a silent PC without the drawbacks of a liquid-cooled one.

With a MBP 16" M2 Ultra it is a given that it will run hot and half the battery life if the enclosre remains the same.

Buyers will accept that.

It will only happen if there is a surplus of Ultra chips and enough of a marketing reason to do so.

In agriculture when there's an over supply of tomatoes they divert it from the fresh produce section onto the canned or bottled tomatoes to make sauce that will be good for a few months/years on the shelf.

Ummmm. the M1/M2 are essentially A14X/A15X with specializations to make them more usable in a Mac... before the iPad Pro we got several iPads with the X series of chips with more CPU/GPU cores, and when we got the A12X/Z iPad Pro we got double the CPU and GPU cores of the iPhone.... The iPad has a long history of high performance chips before the M series showed up.

Confused-User · Feb 8, 2023

leman said:
Why would you think so? Wouldn’t that entirely depend on their chip network topology? If it’s a ring the latency to different controllers will be different. Isn’t the average/amortized latency more interesting measure anyway?

And I doubt there is a latency penalty for UltraFusion itself, it’s just a bridge that connects two chip networks into a single one. The amortized latency will obviously increase, but that’s be a use the network got bigger. In the related patents I saw Apple claims that there is no or very little latency cost.

There are multiple "interesting" things here. Average latency is important. But I was just focusing on whether memory access was visibly NU or not, because if it is, then that's something you either have to deal with, or choose to ignore, at some cost (which you're presumably accepting as a tradeoff).

If the additional latency is not that great, then indeed it may not really matter enough to care when going to main memory. But wait... more than 90% of the time, you're not going to main memory, are you? You're going to L2 cache, or remote L2 cache (possibly quickly due to copied tags in your local cache), or the SLC.

In the latter two cases, even the 10ns more latency you suggested in a previous post would be quite significant, no?

I doubt it will have much practical impact since memory access will be amortized across different chips and the additional latency due to a longer data path is negligible relative to the DRAM access cost. Unfortunately, I am not aware of anyone who tested DRAM latency for the Ultra. I’d expect it to be in the ballpark of 140-150ns, slightly slower than Max due to the bigger network.

Maybe. But as I said, since most of the time when reaching across the bridge you're still going to cache, a little extra latency might not really be negligible.

The bigger problem is power, since routing signals further costs more energy. But there are also patents that deal with this: by progressively reducing the size of the data package the closer it gets to destination (dropping resolved address bits) and copying RAM to a local controller if the activity is limited to only a few processing clusters.

Irrelevant in the moment, but amusing. I saw that patent about dropping resolved bits too. My level of understanding (not as high as I'd like it to be!!) is such that I was surprised that this optimization actually matters enough to be worthwhile. I mean, it's a partial word against at least a half cache line. Amazing.

Apple scheduler is already kind of NUMA-aware. It attempts to schedule related threads to run on the same CPU cluster, maximizing L2 reuse and optimizing energy usage. That’s probably the most important optimization. The other thing is moving RAM contents to a local memory controller, but that’s more of a power usage optimization.

OK, that counts as NUMA-aware. But it's a somewhat different focus of optimization. I still think, unless there's a problem with my argument about going to cache across the bridge, that you're going to see a meaningful latency issue with nonlocal memory. Not necessarily a huge issue, but a meaningful one (compared to, say, other things they spend effort optimizing out).

bcortens · Feb 8, 2023

Xiao_Xi said:
Although I have not found numbers for a comparison between GPUs, Luke Miani has done a general comparison between a laptop with RTX 4090 and the best MacBook Pro.

I don't agree with him that the MSI Raider is the PC counterpart to the MBP, the MBP weights over 2 pounds less, it is an inch less wide, 2 inches less deep and almost an inch thinner. These aren't competitors...

Edit he also makes numerous false claims. Such as: the i9 is twice as fast at Cinebench but uses 2x the power, when in fact it uses closer to 4x the power....

Discusses the 4090 offering twice the performance in wildlife extreme without ever mentioning the power consumption ... Notebook check found power consumption when playing the Witcher at over 200 watts... Notebookcheck

This thing isn't really a laptop, its a desktop you can move without turning it off...

Edit again:Later he talks about the dedicated rendering hardware in the M2 Max and how comments are going to say it was an unfair comparison. He should also have brought up the same point about the RT cores in the Blender test... He makes the good point that the buyers care about the actual usable capabilities.

He later points out how slow the MSI is on battery which further supports my point, its not a laptop its a movable desktop.

bobmans · Feb 8, 2023

scottrichardson said:
When you think about it. 3nm M3 series chips will likely have a bunch more cores given the extra die space available within the same power constraints as everything will be smaller. So it’s fair to say that an M3 Max could very well end up having the core count of an M1 Ultra, all while using the same power as an M2 Max.

And just for fun, here’s my M3 series chip speculation:

M3 - 12 core, 3.78Ghz
6p cores, 6e cores / 16GPU cores / 32GB max RAM

M3 Pro - 16 core, 3.78Ghz
10p cores, 6e cores / 24GPU cores w. ray Tracing / 48GB max RAM

M3 Max - 16 core, 3.89Ghz
10p cores, 6e cores / 48GPU cores w. ray Tracing / 144GB max RAM

M3 Ultra - 32 core 3.98Ghz
20p cores, 12e cores / 96GPU cores w. Ray tracing / 288GB max RAM

Wouldn't be surprised if instead of adding cores, Apple ends up having the same amount cores on a smaller die. They have a history of doing this in order to cut costs when moving to a next-gen node (next-gen node == more expensive, so get more chips out of a wafer to offset it). The improvements from cramming more into a die usually come the 2nd year after moving to a new node when the price per wafer went down.

A9	16nm	104.5mm2
A10	16nm	125mm2	Stayed on same node --> Increase die size
A11	10nm	87.66mm2	New node --> Shrink die size
A12	7nm	83.27mm2	New node --> Shrink die size
A13	7nm	98.48mm2	Stayed on same node --> Increase die size
A14	5nm	88mm2	New node --> Shrink die size
A15	5nm	107.68mm2	Stayed on same node --> Increase die size
A16	5nm (4 but actually 5)	>107.68mm2	Stayed on same node --> Increase die size

Joe Dohn · Feb 8, 2023

sam_dean said:
Who ever would have thought a M1/M2 laptop chip being placed in a tablet? I, like many here, thought it was overkill. Then I remembered economies of scale and excess yield so there is an abundance of chips.

Oh, I have nothing against a powerful chip on a tablet. But if you are going to give me a powerful chip, give me the means to USE its power!

The current iPad Pros have a desktop processor, but users are restricted to MAYBE using its full power only in drawing, 3D modeling and video editing (and even then, I've heard reports that video editing feels cramped on the iPad screen).

Contrast that with the Surface, which has a powerful chip, but is much more versatile.

Now, even if Apple doesn't want to give users full-blown MacOS, the iPad would be a much more useful machine if they allowed virtualization, software emulation and software development on it. They are supposedly selling a "Pro" device, after all.

Confused-User · Feb 8, 2023

bobmans said:
Wouldn't be surprised if instead of adding cores, Apple ends up having the same amount cores on a smaller die. They have a history of doing this in order to cut costs when moving to a next-gen node (next-gen node == more expensive, so get more chips out of a wafer to offset it). The improvements from cramming more into a die usually come the 2nd year after moving to a new node when the price per wafer went down.
[...]

That's possible. However, the trend is a lot less clear if you count transistors instead of mm^2. Maybe it's not a trend at all. (I haven't looked back to see.)

Also, and more significantly, we're coming off a major snafu where the first N3 design was held up for an entire year. It's certain that they've done further work on the design since then. So the forthcoming A17 will likely be a notably bigger advance than you'd normally see in one year - much closer to two year's worth of development.

More evidence that Apple plans to update Apple Silicon yearly

Suspended

macrumors 68000

macrumors 65816

Suspended

macrumors 68000

Suspended

macrumors 68030

Suspended

macrumors 68030

Suspended

macrumors 68030

Suspended

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors 6502a

macrumors 6502a

macrumors 6502a

Our Staff