Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Many people were finding it hard to believe that Apple would use a Tick Tock update cycle for their chips.
“Tick Tock” is not so much a strategy as an emergent result of the different cycles of chip design and node process shrink.

You get the most performance/efficiency improvements out of a process shrink but the process shrink cycle is long and not fully predictable. Within that node sign range, you can make design changes to improve efficiency or performance, but the effect is smaller. When you combine these two cycles, you get a series with a few large improvements and more frequent small improvements. That pattern has been called “tick tock”.
 
Marketing budgets are getting slashed and there is weak PC demand. This is free marketing that generates hype; they know exactly what they are doing.
Yeah, all of this is intentional.

Mark Gurman is intentional too, by the way--it's Apple that feeds him the information
 
If you choose NOT to run, eg, Mathematica, or Blender, or Xcode, that's your choice...
Sure. There's a handful of applications that really need the power. For most users (and MOST of MacRumors, I gather), that power is overkill. But everyone will still buy an M1 Pro Mac To the Ultra and Extreme.
 
  • Like
Reactions: Gudi and richinaus
M1 single core: 1766
M1 Max single core: 1795 = +1,6%

M2 single core: 1890
M2 max single core: 2027 = +7,2%

Doesn't sound very plausible to me? Or maybe higher clock or 3nm?
 
You said in your previous post "Like with the m1 vs m2 we should see a 35% faster gpu". If the memory bandwidth for M2 Pro/Max is the same we won't likely see 35% better performance. I didn't say you said it's just because of the two extra GPU cores. I said the majority of the perfromance increase was thanks to the higher memory bandwidth. So if the memory bandwidth will remain the same we'll see around 16% better GPU performance for the same number of cores.


Yes, if the M2 pro/max keeps the same band-with as the M1 pro/max then you are totally right, but some rumors said that both will have higher band-with
"Apple's next-generation 14-inch and 16-inch MacBook Pro models with M2 Pro and M2 Max chips will be equipped with "very high-bandwidth, high-speed RAM"
 
Last edited:
  • Like
Reactions: Homy
I absolutely agree - in a laptop. There's a reason I own an M1 MBA, M1 Pro 14 and M1 Max 16, but no Windows laptops!

On the desktop though I have an i9 12900K / 4900 for a bit of work and gaming and in normal desktop use it's not even all that power hungry, but can really chew through the watts when you need them.

I would love to see Apple put out a chip that is as unconstrained, as I just don't think desktop power consumption is that big a deal.

The problem is that the Apple design is very much a braniac design. It's very wide and it achieves performance through optimizing the way it decodes, caches, reorders and schedules instructions. This type of complex design is often harder to scale to higher frequencies. You can't just raise the voltage and frequency and let loose.

All chips these days are braniacs, the last Roadrunner (dumb but high frequency) was probably the Pentium 4. The thing is that Apple have have really pushed the braniac design philosophy even further than Intel in the name of power efficiency which is why they can run these desktop performance level chips in their laptops and still get amazing battery life. Upping the clock a lot wouldn't be easy though, without some changes to the HW design.
 
Many people were finding it hard to believe that Apple would use a Tick Tock update cycle for their chips.
It's silly to extrapolate Apple's ultimate plans from two data points.
In particular Apple has been screwed over by
- covid, then
- China lockdowns, then
- TSMC (possibly) delaying N3 by a quarter (the exact details on the timing are unclear, as is when Apple knew about the dates).

To my eyes A15/M2 is the expected "second chip on N5" design – minor tweaks and improvements to what was already there and with a primary focus on improving energy.
A16 (and M2 Pro/Max) are the catchups, the chips that shipped on N4 (very mildly improved N5) because N3 was not ready and so the real successor CPU designs (targeting many more transistors available) can't yet ship.
The only unclear issue was whether M2 Pro/Max would use the A15 or A16 cores and that's still unclear... (The main difference between them seems to be that A16 hits higher frequencies than A15, so ???)

We can see Apple's longer term plans more clearly in the patent record. There we see recent patents for things like a new coherence protocol, new ways to distribute the memory address space over LARGE numbers of memory controllers, new NoCs (Networks on Chip), new technology for hypervisors and associated TLB handling.
These are all technologies that are of substantially more value to large chips not phone/MacBook Air-sized chips... But they are also technologies that are not easy to get right the first time...

My current guesses are that
- there remains a plan for serious CPU domination at all performance levels, from watch to high-end-desktop to cloud (inside Apple's data warehouses)
- this plan (because Apple is not dumb, unlike a certain company named I*t*l) has Plan B components at every stage, to deal with contingencies like TSMC slipping a node, or Apple itself taking longer with some aspect of a design than expected
- this plan includes a lot of "parallel" "in-hardware" testing. What I mean by this is that they don't make a big deal about this but Apple has frequently, in earlier designs, slipped in some aspect of a future design in a way that includes a fallback. For example they shipped A10 as having big+small cores (so that they could test the small core design) but in a way that could have been shipped, one way or another, if there were issues with the small core design. I think A11 had the small cores as 64b only so, again, they could test dropping 32b support in a way that was not catastrophic if they made a mistake (I forget the exact timing details but it was around there).

So my *guess* is that there is more going on internally in both A16 and M2 Max/Pro than meets the eye. While the CPUs may look like the boring old A15 CPUs (which in turn looks mostly like a boring old A14, just somewhat improved indirect branch prediction and somewhat optimized sizes for things like ROB and number of physical registers), just running at a higher frequency, I would not be surprised if they have actually implemented some aspects of these new ideas (ie the new coherence protocol or the new hypervisor TLB stuff, or even the new NoCs).
If there are some problematic issues, these can be discovered in a context where it's not catastrophic (just never activate that new functionality) so that they can be updated and are ready for being debuted where they matter, on the M2 (or M3?) Ultra design.

(It's even possible, who knows?, that this is exactly what has already happened once. We got a first round of A15 and M2, some of the most advanced "being tested" functionality was found to have a flaw or two, and M2 Pro/Max were delayed a few months to update them with a fixed version of this functionality, where in turn it will be tested before the next round of Ultra.)

Remember we always need to look at the big picture. We have become so used to Apple as this unstoppable SoC-designing machine that we forget that M1 was the FIRST version of a desktop level design, and M1 Ultra the first version of a chiplet-style design. Yes, Apple got many things right this first time, but getting things right is not the same thing as getting them OPTIMAL. In particular M1 Max GPU scaling was mildly disappointing, and M1 Ultra GPU scaling was clearly disappointing (along with other weirdness like the inability to make use of the second NPU present on an M1 Ultra). Much of what I have described is part of the infrastructure to fix this, to improve this scaling across Max, Ultra (and up to the mythical Extreme). But Apple is not Intel or IBM; they haven't been building these large-scale NoCs and coherency protocols for years – hence the need (IMHO) to test their designs via A-series and M2/Pro/Max in readiness for the large ultimate targets.

It's interesting (for example) and disappointing, but not surprising, that we don't see a Metal benchmark accompanying the M2 Max CPU benchmark, which might allow us to see whether the M2 Max GPU scaling is in fact better than we saw with M1 Max.
That's what I will be looking for once these things become public...
 
M1 single core: 1766
M1 Max single core: 1795 = +1,6%

M2 single core: 1890
M2 max single core: 2027 = +7,2%

Doesn't sound very plausible to me? Or maybe higher clock or 3nm?
M2 runs at 3.5GHz
This result (assuming it's legit, but it doesn't look obviously fake) is running at 3.68 GHz.
It's possible that this is a product choice, feasible given the presence of a fan.

It's also possible that while the M2 used the A15 cores (Avalanche and Blizzard) the M2 Pro/Max use the A16 cores (Everest and Sawtooth), which we know have been tweaked to run at 3.5GHz in the phone. This latter is MY guess, that the M2 Max/Pro targets N4 and is more a boosted A16 (with not just A16 cores but also GPU and NPU) than a boosted A15.
Hopefully we'll learn soon...

One (easily visible) hint might be if the new MacBook Pro's offer always-on display functionality, which is new with the Display Controller in the A16. But, on the other hand, always-on display functionality is much less useful in a laptop than on a phone, so???
But what do I know? Maybe Apple have a plan to make it more common for people to leave their laptop screens up, and always showing "generally useful" functionality, like weather and news headlines, kinda like how we arrange widgets today on our iPhones???
 
Yes, if the M2 pro/max keeps the same band-with as the M1 pro/max then you are totally right, but some rumors said that both will have higher band-with
"Apple's next-generation 14-inch and 16-inch MacBook Pro models with M2 Pro and M2 Max chips will be equipped with "very high-bandwidth, high-speed RAM"

Let's hope so. They may use LPDDR5X instead of LPDDR5.
 
  • Like
Reactions: Two Appleseeds
It’s not, and I seriously wish people would stop pushing that narrative.
Apple has a history of process shrinks in their own custom silicon, and it always has happened the same way.
Take the shrink from 7 nm down to 5 nm.
First appeared in the iPhone 12 and iPad Air4 with the “A14” chip, the least demanding, lowest end processor in their main line-up.
Then it came to the lowest end Mac computers and iPads with the “M1”, a chip for the MacBook Air and the iPad.
Then, months and months later, it finally came to the MacBook Pros with the “M1pro/max”.
And then, another seven months after that, it finally came to a top-of-the-line desktop computer with the “M1ultra”.
And it more than likely will come to the Mac Pro next year.
People expecting Apple to jump to 3 nm on the Best of the top-of-the-line products are getting it totally backwards.
It’ll come to the “A17”, then the “M3”, then the “M3pro/max”, then the “M3ultra/extreme or whatever it’ll be called”.
This is nothing new.
That's not true. For example A10 was on 16nm, while A10X was on 10nm.

I think Apple are essentially opportunistic and flexible in how they deal with these issues. Their designs appear to be at a fairly high level so that they can be "recompiled" to a new process if appropriate. The question, then, of whether it's the phone chip, the iPad/low-end portable chip, the mid-range chip or the high end chip that goes first on any process is more an issue of when the process is ready vs what Apple' needs for its schedule (iPhone has to hit September, everything else is flexible).

HOWEVER (certainly right now, this could change in the future) as I described in my earlier long post, Apple is constrained in a different way, that they want to move to a lot of new technology required to scale a large design across multiple chiplets (new coherence protocol, new NoC, new mapping of address space to memory controllers, ...). The largest designs cannot role out until this stuff works, and I imagine for cost control this stuff is being test on the smaller chips (where, if it fails, no catastrophe, there's a fallback). So that imposes an artificial limit on the ordering (small chips first to test, then the largest chips); but that may not remain the case once the larg chip designs come into their own.

I personally expect that the M2 Pro and Max are on N4 (unlike the M2 which is on N5) and would not be surprised if the M2 Ultra is on N3 (even though it doesn't make optimal use of it, being perhaps just a slightly tweaked M2 Max design). And once ALL THIS stuff is out the way (N3 works, all the new Ultra infrastructure is in place) I expect a new CPU core (I have ideas, oh yes, for how to use the extra transistors...) that will be a welcome jump in performance from the somewhat lethargic A14 to A15 to A16 boosts of the past two years. At that point the ordering of which cores come out first may well change as Apple can afford to try putting out M series designs as soon as TSMC is ready, whereas A series designs have to wait until TSMC can produce iPhone volumes (cf A10X vs A10).
 
If you haven't read Name99's post (edit: posts, starting with #84) carefully, go back and read it again. More intelligence in there than the entire rest of this thread so far by 100x. There's a bit more to say about this though.

Scaling in the Max, and especially in the Ultra, is extremely disappointing. Not necessarily surprising - as name99 says, this is Apple's first attempt. And not necessarily making them a bad value - they're still pretty fantastic in some ways. But you can clearly see where Apple hits diminishing returns, and sometimes even a wall, way earlier than they should. There is no question that this is something they needed real-world experience with, and so they went out and got it. And they paid for it, by overengineering to make up for deficiencies. For example, the ridiculous and still unparalleled bandwidth between the two component chips in the Ultra - they aren't beginning to make efficient use of it, but there's enough that the chip still performs.

While it's interesting to see them gradually scale up their clocks - which is no simple achievement, with the crazily wide designs they use - what's much more interesting is to see how well and how quickly they can scale up multicore efficiency, primarily in CPU and GPU. (It's not clear if NPU scaling is an issue, or if it takes more learning than what they need to get GPUs right.) There are also related problems that need solutions- the "uncore" can consume more than half the power of even large multicore Intel and AMD chips, and if they can't do better, this will really spike their ability to bring their advantage of power efficiency to large core count designs. And there are issues somewhat unique to the platform, like figuring out how to do large memory without giving up the advantages they have now from having a single shared close memory, though that may well wait until the M3 or whatever goes into their AS Pro platform.

The recent benchmark suggests that the M2Max does indeed improve on scaling, much as the M2 does - which is to say some, but not a lot. And that in turn suggests again that name99 is right about this as well - the M2 was not a big redesign of the M1, and in particular (to use the Intel terminology) the uncore is not a big redesign, just a bit of picking of low-hanging fruit.

The M2 was overall quite disappointing in many ways (though my M2 Air is the best machine I've ever owned, and I've never regretted owning it). But it's quite understandable given the massive shocks to the economic system over the last couple of years. I expect that the M2 Pro and Max (and Ultra, if any) will have quite similar profiles, though perhaps there will be somewhat stronger payoff on whatever low-hanging fruit they did manage to grab for multicore performance. The big question is if they can make really strong strides with the M3, and if they can do it reasonably soon - hopefully no more than a year from when the M2 was introduced, and I wouldn't mind if they made my Air obsolete even faster than that.
 
Last edited:
  • Like
Reactions: eldho and jdb8167
Let's hope so. They may use LPDDR5X instead of LPDDR5.

As far as CPU performance is concerned, the bandwidth limits you see are not even close to DRAM limits.
The CPU bandwidth limits are essentially grounded in how fast an L2 can accept data from the NoC. That's why we see each P cluster can accept only 100GB/s, but the Max design can deliver 400GB/s as a whole.

Should the CPU cluster require more bandwidth (unclear this is a good energy/performance tradeoff) this is easily fixed in multiple ways.
(One of the more interesting, and less obvious, is to change the clustering: instead of 4 cores that share L2 TLB and L2 and AMX [and other things like LZ engine] we switch to a smaller mini-cluster of 3 cores that share AMX and L2, and pairs of these mini-clusters share the LZ engine and have access to each other's L2's. This gives us
- 6 rather than 4 cores as the basic "P cluster" but
- 2 AMX units per cluster
- "fast" L2 for three cores, along with slower L2 (maybe 20 cycles rather than 13 cycles for the local L2) when you go from one mini-cluster to the other.
Something like this would now give us the 100GB/s L2 to DRAM bandwidth shared across 3 rather than 4 cores, so slight boost in per-core bandwidth, along with a nicely balanced boost in AMX capacity and 1.5x cores across the line.)

Main point is, nothing is written in stone. Clustering can be reshaped. The NoC width can be widened. Many alternatives are possible, and Apple has been far more flexible than the competition in changing anything and everything across designs as new design-space options are opened up by more transistors.
 
As far as CPU performance is concerned, the bandwidth limits you see are not even close to DRAM limits.
The CPU bandwidth limits are essentially grounded in how fast an L2 can accept data from the NoC. That's why we see each P cluster can accept only 100GB/s, but the Max design can deliver 400GB/s as a whole.

Should the CPU cluster require more bandwidth (unclear this is a good energy/performance tradeoff) this is easily fixed in multiple ways.
(One of the more interesting, and less obvious, is to change the clustering: instead of 4 cores that share L2 TLB and L2 and AMX [and other things like LZ engine] we switch to a smaller mini-cluster of 3 cores that share AMX and L2, and pairs of these mini-clusters share the LZ engine and have access to each other's L2's. This gives us
- 6 rather than 4 cores as the basic "P cluster" but
- 2 AMX units per cluster
- "fast" L2 for three cores, along with slower L2 (maybe 20 cycles rather than 13 cycles for the local L2) when you go from one mini-cluster to the other.
Something like this would now give us the 100GB/s L2 to DRAM bandwidth shared across 3 rather than 4 cores, so slight boost in per-core bandwidth, along with a nicely balanced boost in AMX capacity and 1.5x cores across the line.)

Main point is, nothing is written in stone. Clustering can be reshaped. The NoC width can be widened. Many alternatives are possible, and Apple has been far more flexible than the competition in changing anything and everything across designs as new design-space options are opened up by more transistors.
Oh, as far as GPU goes, it's unclear that occasional disappointing GPU performance is, per se, the result of DRAM bandwidth. The GPU is still under aggressive development (as is the GPU firmware and Metal itself) and limitations lower down the design may be, for now, more important than raw DRAM bandwidth.

Even in terms of bandwidth, a very recent set of Apple patents describe new, more sophisticated and aggressive, types of compression at multiple levels of the GPU stack (very simple compression in GPU L1, fancier in L2, fanciest in SLC). This sort of technology has the possibility to reduce GPU bandwidth demands on DRAM substantially.
My guess is this, plus LPDD5X, should be fine for the immediate future.
 
While it's interesting to see them gradually scale up their clocks - which is no simple achievement, with the crazily wide designs they use - what's much more interesting is to see how well and how quickly they can scale up multicore efficiency, primarily in CPU and GPU. (It's not clear if NPU scaling is an issue, or if it takes more learning than what they need to get GPUs right.) There are also related problems that need solutions- the "uncore" can consume more than half the power of even large multicore Intel and AMD chips, and if they can't do better, this will really spike their ability to bring their advantage of power efficiency to large core count designs. And there are issues somewhat unique to the platform, like figuring out how to do large memory without giving up the advantages they have now from having a single shared close memory, though that may well wait until the M3 or whatever goes into their AS Pro platform.

Apple's work on the uncore is REALLY interesting. I describe the evolution in volume 3 of my series at https://github.com/name99-org/AArch64-Explore
One thing that's clear is that they have devoted vastly more effort than their competition to getting this to run at low energy; but, as we agree, they now have to grow this up for Ultra-class designs.

The NPU, I have some sympathy for them.
This is something you have to hide behind an API, given how fast the space is moving. But what API? I can't blame them for wanting to avoid PyTorch (which is more than a little, uh, non-robust) or Tensorflow (over which they have little control). The current scheme of having an Apple API first, together with doing their best to ensure PyTorch and Tensorflow make optimal use of their HW, seems about the best reasonable option; but that will be an endless story of tweaking API (and third party code in PyTorch and Tensorflow) to match their HW. Sucks, but what can you do?

In terms of the NPU hardware, that's more interesting. No-one knows what will be the optimal AI HW in five years! You can see how nVidia have gone through adding fp16 then bf16 and int4 and int1, then removing int4 and int1 but adding int 8, as they keep trying to track what it looks like future AI will want. Apple has done somewhat similar flailing and redesigning what AMX can do.
In terms of their NPU specifically, that seems (to my eyes anyway) very definitely to have begun life as a convolution engine. There was a point where A-series chips had convolution engines in both the ISP (to handle vision tasks like finding faces in an image) and in the display controller (to provide better, non-linear, upscaling). It seems like that idea of a convolution engine was extracted from both these locations and consolidated as a single, larger, piece of HW that could become a generic "NPU-lite", and (at least as far as the patent trail goes) that's still where it is. So it can handle convolutions (which is much of anything visual, to be fair) and pooling and the obvious non-linear lookups (both easy to add).
But if you look at more mature NPU hardware it includes a lot of machinery for shaping dataflow, and for handling sparsity; and Apple's NPU so far seems to be missing both of these. I'm guessing the newest NPU hardware also has ways to accelerate all the coolest new ideas like Attention, but I do not know the space well.
I expect Apple's NPU is picking up this sort of functionality (handling more data shapes, sparsity) with every iteration; just like AMX did (for AMX this happened a few years ago so the patents have become public; for NPU, not yet). But everything takes time! Time to design the hardware, time to write the firmware and add the APIs. Things like the recent PyTorch updates should remind us of just how much can still be done (and needs to be done) in HW. (Similarly, in fact, even in the much more mature space of Apple's GPU.)
In a way Apple operates in a different space from nV and Intel. Both of those have a PR pipeline where things are announced, then ship a year later, then are actually usable (FW and drivers actually work) a year later still. Meanwhile Apple announces nothing until the SW is ready at WWDC; so it always looks like they're lagging a 18 months to two years behind nV and Intel, but the lag is, in reality, not nearly as bad. (It is still present, but it improves substantially with each new year's HW and SW.)
 
Hope that Apple will boost the frequencies on Mac Studio and Mac Pro, 3.6Ghz is far behind Intel’s turbo boost
Why? Blind frequency boosting is the DUMBEST way to speedup your CPU. Not just the energy overhead; it ties you far too tightly to the (largely uncontrollable) process timetable, and make your design crazy fragile so that you live in terror of adding new (but good) ideas because something unpredictable may break.

Much better would be to increase the IPC, and that is perfectly feasible. My guess is the relevant design is already essentially complete, it's just been waiting for N3 because it was designed assuming many more transistors are available than for N5/N4.
 
I wonder if whoever is doing these benchmarks at Apple realizes these scores are publicly posted? I can imagine that a few internal emails have gone out trying to figure out who did this. Management probably isn’t thrilled. Haha

Unclear. For the past few years genuine leaks (as opposed to jackasses making up stuff) have
- only appeared close to when the product is announced

- are very limited in exactly what they test. So, for example, they only give the single core results, or single and multi-core. But not eg the Metal results or the ML results. That's strange, no? If you were some low-level Apple intern with your hands on a new iPhone, firing up an internet connected GB5 to see how this new baby performs, why would you NOT also test the GPU and NPU...

It seems like Apple more or less sanction CAREFULLY LIMITED "leaks" of GB5 results near product announcement, to build up excitement. But they limit exactly what info is released via GB5 so that the thunder isn't completely stolen – we're excited that there's a new M2 Max, but we will watch the keynote anyway because so many details remain unclear. Not just the GPU and NPU performance, but even whether this is an N5 vs N4 chip, and based on the A15 or A16.
 
Was asked to move our discussion of this elsewhere over to this thread. So:

The 2027 SC score of this 3.68 GHz M2 variant is about what we'd expect based on increased clock alone vs. the production M2: GB gives an average SC score of 1899 for the production M2 in the 13" Pro; extrapolating gives us 1899 x 3.68/3.49 = 2002, which is within normal GB variation of 2027.

Plus the fact that we're seeing these clock speed variations (3.54 GHz last week) suggests (assuming these scores are legit) that these are preproduction machines used for testing.
 
Safari has gotten at least 30% faster in the las 12 months. I remember speedometer was at 276 points last year and now it's at 375 points on the same machine.

The m2 is over 400 points but im guess it could be going towards 450 points with the new scores on the m2 pro/max
For what I do everyday Safari is much faster than Chrome. I'm a long time Chrome user and frankly it's been very disappointing lately. Considering moving to Safari as my daily browser.
 
3.7ghz (m2 max) / 3.2ghz (m1 max) = 16%.

So it's a very small "upgrade" since it is mostly from overclocking. This M2 Max is going to run hotter and sip more energy under stress.

If you needed M2 Max performance, you could have gotten it with a simple overclock of the M1 Max if Apple allowed it through firmware settings, rather than buying a whole new laptop.

The GPU from M2 is significantly more performant than the M1. When translated into the higher number of cores the Pro/ Max has, it'll be pretty impressive. These CPU benchmarks aren't telling the whole story about potential performance of the SoC as a whole.
 
The power consumption will certainly not be as good as the M1 Max
You don't consume power, you consume energy.
If you don't know the difference between power, energy, and the energy-delay product, and which matters for which purpose, then you really shouldn't be making (silly) comments like this.
I strongly disagree with the way you criticized this poster, since it's condescending in tone while also being incorrect. There's nothing wrong with the phrase "power consumption". It just means energy consumption per unit time. Indeed, it's commonly used in that way as a term of art. Here are just a few examples:


1670624820156.png

1670625362700.png

1670625364902.png
1670625365867.png
1670625366759.png
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.