Intel Downplays Apple's M1 Chip With 'Carefully Crafted' Benchmarks

dmccloud · Feb 23, 2021

pdoherty said:
Can we, really? Why can’t we equally presume we’re already seeing the primary benefits of SOC in this chip? It’s not something new since they’ve been doing it for some time on the iPads. And adding cores won’t make much difference. Possibly higher clockspeeds or more RAM might help but are we really expecting anything huge? If so why?

iOS/iPad OS still do not do true multitasking, whereas Mac OS has done so for decades at this point. That reason alone makes it somewhat pointless to use the iPad as an indicator of where the M-series might go in the future. The iPad also has significant physical and thermal constraints that limit how far Apple could push the SoC in those devices. The Mac does not have those same limitations, especially in larger models such as the 16" MBP, iMac, and Mac Pro. The ARM ISA lends itself to scalability, and Apple's proprietary architecture actually ramps up the scalability factor. It wouldn't be hard to create an SoC with an 8/4 or 8/8 CPU core coupled with a 16 core GPU, since the underlying building blocks are already present in their designs. Apple could also go even wider with decoding units in future versions of the M-series processors, since the fixed instruction length makes it simple to widen the highway even with OOE being used by the system. That would increase the IPC (instructions per cycle) count, which is a far better indicator of actual performance than clock speeds have been.

leman · Feb 23, 2021

pdoherty said:
Can we, really? Why can’t we equally presume we’re already seeing the primary benefits of SOC in this chip? It’s not something new since they’ve been doing it for some time on the iPads. And adding cores won’t make much difference. Possibly higher clockspeeds or more RAM might help but are we really expecting anything huge? If so why?

Why do yo think that adding cores won't make much difference? The M1 is essentially a quad-core CPU chip (disregarding efficiency cores for obvious reasons) and a GPU with 1024 shading units (comparable to low-end dGPUs), which makes it entry level compared to the rest of the market. A current 16" MacBook Pro has a CPU with 8 cores and up to 2560 shading units. Performance scales, and there is no reason why Apple Silicon wouldn't scale just as well if they add more cores (in fact, it will probably scale better since it is more power efficient and doesn't need to be downclocked to increase the core count). Memory-wise, we will certainly see wider memory interfaces with more bandwidth (i.e. multi-channel RAM, with high number of channels).

pdoherty · Feb 23, 2021

cmaier said:
Because this chip was aimed at the low end machines, and easily scales in core count, frequency, and bus bandwidth.

Source for that?

cmaier · Feb 23, 2021

pdoherty said:
Source for that?

I’m a CPU designer. There’s sufficient information available about its internal bus configuration and floor plan to enable me to make that determination.

pdoherty · Feb 23, 2021

dmccloud said:
To expound on this point, think of the M1 as the baseplate for a Lego set. Apple SoCs are designed for scalability in that you can easily add additional cores, RAM, and related components (such as TB3 controllers) to the baseplate and create something much more powerful and useful in the process.

My main point is that the principle benefit of the M1 is that everything is on the same bus/chip so that benefit is largely already “gotten”. Adding more cores doesn’t benefit very many workflow types or apps. And I haven’t heard much about the ability to scale frequency on these since so many items are on the same chip it may limit that.

pdoherty · Feb 23, 2021

dmccloud said:
iOS/iPad OS still do not do true multitasking, whereas Mac OS has done so for decades at this point. That reason alone makes it somewhat pointless to use the iPad as an indicator of where the M-series might go in the future. The iPad also has significant physical and thermal constraints that limit how far Apple could push the SoC in those devices. The Mac does not have those same limitations, especially in larger models such as the 16" MBP, iMac, and Mac Pro. The ARM ISA lends itself to scalability, and Apple's proprietary architecture actually ramps up the scalability factor. It wouldn't be hard to create an SoC with an 8/4 or 8/8 CPU core coupled with a 16 core GPU, since the underlying building blocks are already present in their designs. Apple could also go even wider with decoding units in future versions of the M-series processors, since the fixed instruction length makes it simple to widen the highway even with OOE being used by the system. That would increase the IPC (instructions per cycle) count, which is a far better indicator of actual performance than clock speeds have been.

Thanks for a substantive answer. These could indeed lead to increased power.

pdoherty · Feb 23, 2021

leman said:
Why do yo think that adding cores won't make much difference? The M1 is essentially a quad-core CPU chip (disregarding efficiency cores for obvious reasons) and a GPU with 1024 shading units (comparable to low-end dGPUs), which makes it entry level compared to the rest of the market. A current 16" MacBook Pro has a CPU with 8 cores and up to 2560 shading units. Performance scales, and there is no reason why Apple Silicon wouldn't scale just as well if they add more cores (in fact, it will probably scale better since it is more power efficient and doesn't need to be downclocked to increase the core count). Memory-wise, we will certainly see wider memory interfaces with more bandwidth (i.e. multi-channel RAM, with high number of channels).

The vast majority of user apps do not use functions that make multiple cores useful. They are constrained by the best that your single-core processor can do.

cmaier · Feb 23, 2021

pdoherty said:
My main point is that the principle benefit of the M1 is that everything is on the same bus/chip so that benefit is largely already “gotten”. Adding more cores doesn’t benefit very many workflow types or apps. And I haven’t heard much about the ability to scale frequency on these since so many items are on the same chip it may limit that.

No, that’s not the principle benefit of the M1. The principle benefit of the M1 is that it is extremely-wide issue, and that the physical design of the cores is incredibly power efficient (both because Arm is better than x86, and because Apple is particularly talented at CPU physical design). Not sure what “everything is on the same bus/chip” is supposed to mean, but I assume you mean the unified RAM architecture (which has nothing to do with being on the same bus/chip - it’s not even on the same chip). The RAM architecture principally provides improved performance for GPU operations, and is not relevant to the general performance of the CPUs.

chucker23n1 · Feb 23, 2021

leman said:
Why do yo think that adding cores won't make much difference? The M1 is essentially a quad-core CPU chip (disregarding efficiency cores for obvious reasons)

Well, there are diminishing returns in adding cores.

leman said:
Performance scales,

Most code can't easily be parallelized. (And with the code that can, you often want to run it on the GPU instead, because, as you say, it has many more cores.)

crazy dave · Feb 23, 2021

pdoherty said:
The vast majority of user apps do not use functions that make multiple cores useful. They are constrained by the best that your single-core processor can do.

chucker23n1 said:
Well, there are diminishing returns in adding cores.

Most code can't easily be parallelized. (And with the code that can, you often want to run it on the GPU instead, because, as you say, it has many more cores.)

Yes ... but no. For many users, it is indeed true that 4-8 threads is often more than enough and the performance on a single thread (and it's power usage) are the most important feature. But there is actually a reason Apple sold Mac Pro, iMac Pro, and other high core count computers and why AMD (and even Intel although with more difficulty) are pushing to higher and higher core counts in consumer level machines. It isn't *just* marketing (though some of it is).

Most games are defined by single threaded performance, but this is becoming less true with close-to-metal graphics APIs that also make it easier to do multithreading on the CPU and a few games rely on this heavily. But in general, there are a good number of highly utilized applications that make heavy use of multithreading on the CPU (or may be multiprocess!) but are not great fits for the GPU (or they would've been ported already). Things which are not such great fits to the GPU include anything with heavy branching within threads and algorithms with lots of random access memory patterns (also if you don't want use a workstation GPU, 64-bit+ precision math). GPUs are getting better but even with advancements there are a number of applications that are run by a large number of people that are best used on a large number of CPU cores. It's the number of people that use these programs rather than the number of such programs that is important.

chucker23n1 · Feb 23, 2021

crazy dave said:
But there is actually a reason Apple sold Mac Pro, iMac Pro,

Sure, but that’s extremely niche.

crazy dave said:
and other high core count computers and why AMD (and even Intel although with more difficulty) are pushing to higher and higher core counts in consumer level machines. It isn't *just* marketing (though some of it is).

Most games are defined by single threaded performance, but this is becoming less true with close-to-metal graphics APIs that also make it easier to do multithreading on the CPU and a few games rely on this heavily.

Right.

crazy dave said:
But in general, there are a good number of highly utilized applications that make heavy use of multithreading on the CPU (or may be multiprocess!) but are not great fits for the GPU (or they would've been ported already). Things which are not such great fits to the GPU include anything with heavy branching within threads and algorithms with lots of random access memory patterns (also if you don't want use a workstation GPU, 64-bit+ precision math). GPUs are getting better but even with advancements there are a number of applications that are run by a large number of people that are best used on a large number of CPU cores. It's the number of people that use these programs rather than the number of such programs that is important.

I was, to be clear, mostly answering for general-purpose apps. Things like CRUD. The CPU (rather than I/O) is already rarely the bottleneck there, and when it is, it’s hard to paralellize the algorithm.

crazy dave said:
It's the number of people that use these programs rather than the number of such programs that is important.

Yeah, and my impression is that the number of people who actually benefit from a Threadripper frankly isn’t that high.

So, while I fully expect a higher-end MBP to have more cores (8+8? 12+4?), trickling that down to the M4 or whatever MacBook Air isn’t as helpful for that audience as it sounds.

pdoherty · Feb 23, 2021

Most CPU manufacturers are scaling to multiple cores, not because it’s a huge benefit, but because they have nowhere else to go (not being able to run up clockspeeds like the old days allowed).

crazy dave · Feb 23, 2021

chucker23n1 said:
Sure, but that’s extremely niche.

Right.

I was, to be clear, mostly answering for general-purpose apps. Things like CRUD. The CPU (rather than I/O) is already rarely the bottleneck there, and when it is, it’s hard to paralellize the algorithm.

Yeah, and my impression is that the number of people who actually benefit from a Threadripper frankly isn’t that high.

So, while I fully expect a higher-end MBP to have more cores (8+8? 12+4?), trickling that down to the M4 or whatever MacBook Air isn’t as helpful for that audience as it sounds.

No I agree the typical MacBook Air user wouldn’t benefit that much and there are a number of users who think they benefit but don’t (marketing). And I agree that full iMac Pro and Mac Pro market (threadrippers too) is relatively small (though lucrative). The biggest market (reportedly by far) are the airs and the small/weaker pros.

But there’s a big in-between market that is still consumer/prosumer but does make use of large CPU core counts when they need it. The point I (and a couple of others here) wanted to stress is that increasing core count to service that market is still pretty important and the firestorm cores should be *fantastic* at it.

M1X is supposedly 8+4 firestorm/icestorm.

pdoherty said:
Most CPU manufacturers are scaling to multiple cores, not because it’s a huge benefit, but because they have nowhere else to go (not being able to run up clockspeeds like the old days allowed).

There’s definitely truth to this but higher core counts are just as definitely beneficial for a number of important applications and *may* become more so rather than less for certain applications, especially games. Further, firestorm-style cores shouldn’t hit this frequency limit for quite a long time and to be fair x86 cores are finding more creative ways to squeeze out more IPC - even Intel at long last is doing so.

cmaier · Feb 23, 2021

pdoherty said:
Most CPU manufacturers are scaling to multiple cores, not because it’s a huge benefit, but because they have nowhere else to go (not being able to run up clockspeeds like the old days allowed).

The truth is there is a sweet spot. Power consumption increases linearly with clock frequency (at least if you can raise the clock without raising voltage, otherwise it increases by voltage squared as well), so you have a choice to make. If I have the ability to double power consumption, should I double the number of cores, or should I double the clock frequency? Turns out, it depends on the workload.

Of course, doubling the frequency means all that extra power dissipation is happening in a smaller die, and heat removal is a function of die area, so if I double the frequency I may need to increase the die size and spread things out as best I can to allow thermal spreading. But if I do that, the distance between transistors increases, meaning the time-of-flight (6ps/mm) on wires increases, and also the capacitive loading on the source/drains, which tends to slow things down. So I may have to increase the voltage to compensate. But that generates more heat. Etc. etc.

So, yes, doubling the number of cores can be a more elegant solution. And it turns out that for most real workloads, somewhere around 6-8 cores is the sweet spot, with decrease return on investment as you go beyond that. Of course, some workloads are not parallelizable, but these tend to fall into two buckets. (1) stuff that runs fast enough as it is, and doesn’t benefit from increased clock speed very much. For example, interactive apps that can already process faster than their inputs can be generated. (2) stuff (like engineering or science software, encodes, etc.) where more speed is always better. This stuff tends to be comparatively rare, but for those situations you are better off with a design where high single-thread performance was the goal.

M1 is in a pretty nice sweet spot, with very high single-thread performance/watt, plus a reasonable number of cores. There is still plenty of room to (1) raise clock speed, because the power dissipation is so low and (2) add more cores (returns will diminish if they go beyond another 4 or so high performance cores, but even more cores than that would be helpful for certain workloads, e.g. Mac Pro situations).

But, by the way, none of this is because the CPU manufacturers hit a wall. This was ALWAYS the trade off. The issue was, back before, say, the mid 1990s, the number of transistors that could be squeezed on a die was too low to enable much in the way of multicore designs, so the only choice we had back then was higher frequency! So you sort of have it backwards.

leman · Feb 24, 2021

pdoherty said:
The vast majority of user apps do not use functions that make multiple cores useful. They are constrained by the best that your single-core processor can do.

But are we talking about "most code"? Maybe I misunderstood, but I though you were skeptical about Apple being able to deliver more powerful processors? Now, I doubt that their "bigger chips" will be that much faster for browsing (some minor frequency bumps notwithstanding), but there is demand for large multi-core system in all kind of usage scenarios. Content creation, software development, data analysis — scaling of the core count has been a traditional way to improve the performance of such applications.

And of course, there is the GPU and graphics, which is a massively parallel workload itself and will scale with the number of cores.

So yeah, more CPU and GPU cores, more memory bandwidth = more processing capability, faster Macs.

chucker23n1 · Feb 24, 2021

leman said:
But are we talking about "most code"?

IME, most code won't benefit from that many cores. It benefits more from techniques like Turbo Boost (temporarily increasing the clock, or shutting down several cores and then increasing it even further, possibly for fractions of a second), so maybe Apple is looking into something like that.

leman said:
Content creation,

What kind of content? Video, for example, is typically GPU-accelerated now.

leman said:
software development,

It depends — Swift does indeed seem to eat cores for breakfast when compiling. For, say, .NET, IME, more than a few cores doesn't help much at all. (I presume the same goes for Java.)

leman said:
data analysis

Yes, though even there, for a lot of data, precision doesn't matter in the aggregate, and you consider offloading that to the CPU.

leman said:
So yeah, more CPU and GPU cores, more memory bandwidth = more processing capability, faster Macs.

More is always better (if you don't count thermals and other cost), yes.

leman · Feb 24, 2021

chucker23n1 said:
IME, most code won't benefit from that many cores. It benefits more from techniques like Turbo Boost (temporarily increasing the clock, or shutting down several cores and then increasing it even further, possibly for fractions of a second), so maybe Apple is looking into something like that.

Apple has used dynamic overclocking for years now. They simply don't need the huge boost range (unlike Intel or AMD chips) because of their power-efficiency. E.g where Intel needs to boost it's clocks from 2ghz to 5ghz to go from "power-efficient" to "fast", Apple only needs to boost from 2.5ghz to 3ghz.

chucker23n1 said:
What kind of content? Video, for example, is typically GPU-accelerated now.

Sure, but photo and video editing still heavily uses multithreading. Of course, I completely agree with you that GPU performance is more important nowadays, and that these applications won't endlessly benefit from increasing CPU cores. It's not a server workload.

chucker23n1 said:
It depends — Swift does indeed seem to eat cores for breakfast when compiling. For, say, .NET, IME, more than a few cores doesn't help much at all. (I presume the same goes for Java.)

Anything that can do parallel builds will benefit from more cores, especially on larger projects. I don't see the frameworks you mention as an exception.

chucker23n1 said:
Yes, though even there, for a lot of data, precision doesn't matter in the aggregate, and you consider offloading that to the CPU.

Depends on what you do. I have some large R scripts that can absolutely trash as many CPU cores as you can give me. And sure, I could make them much faster and more efficient by rewriting the entire thing in a lower-level language and using the GPU, but I am not going to spend a month or two of development effort on a single research paper of which I am not even a first author

crazy dave · Feb 24, 2021

cmaier said:
(2) stuff (like engineering or science software, encodes, etc.) where more speed is always better. This stuff tends to be comparatively rare, but for those situations you are better off with a design where high single-thread performance was the goal.

Speaking from the science end of this, many simulation algorithms are single threaded but benefit from multiple cores as you need to run many, many simulations (eg monte Carlo or at least different parameters) so you run them as multiprocess. Even though this is technically embarrassingly parallel the algorithm is often still such a poor fit for GPUs that even when many are run in parallel, the CPU is still better. Of course if you have access to one, then obviously a cluster is where you really want to run such a thing. But running smaller jobs and development are nice to have at your fingertips.

leman said:
Depends on what you do. I have some large R scripts that can absolutely trash as many CPU cores as you can give me. And sure, I could make them much faster and more efficient by rewriting the entire thing in a lower-level language and using the GPU, but I am not going to spend a month or two of development effort on a single research paper of which I am not even a first author

That’s also a factor. 🤪

chucker23n1 · Feb 24, 2021

leman said:
Apple has used dynamic overclocking for years now.

Have they?

leman said:
Sure, but photo and video editing still heavily uses multithreading. Of course, I completely agree with you that GPU performance is more important nowadays, and that these applications won't endlessly benefit from increasing CPU cores. It's not a server workload.

Yeah.

leman said:
Anything that can do parallel builds will benefit from more cores, especially on larger projects. I don't see the frameworks you mention as an exception.

Sure, though the thing about parallel builds is that they often have dependencies. If you have, say, a GUI front-end and a web front-end, and they both depend on a shared library, then they must first wait for the library to build, and then each of those can build in a separate core.

(It appears Swift is better at multithreading a build within one project, which is tricky.)

leman said:
Depends on what you do. I have some large R scripts that can absolutely trash as many CPU cores as you can give me. And sure, I could make them much faster and more efficient by rewriting the entire thing in a lower-level language and using the GPU, but I am not going to spend a month or two of development effort on a single research paper of which I am not even a first author

Fair enough!

Maximara · Feb 24, 2021

pdoherty said:
My main point is that the principle benefit of the M1 is that everything is on the same bus/chip so that benefit is largely already “gotten”. Adding more cores doesn’t benefit very many workflow types or apps. And I haven’t heard much about the ability to scale frequency on these since so many items are on the same chip it may limit that.

And yet that has been Intel's solution for a long while - just throw more cores on the chip.

leman · Feb 24, 2021

chucker23n1 said:
Have they?

If I remember correctly, even the earliest Apple architectures (like the A6) have dynamic clock scaling. The more recent CPUs definitely have it, and the turbo boost levels are known empirically. Example for A11 and A12: https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-unveiling-the-silicon-secrets/3

chucker23n1 · Feb 24, 2021

leman said:
If I remember correctly, even the earliest Apple architectures (like the A6) have dynamic clock scaling. The more recent CPUs definitely have it, and the turbo boost levels are known empirically. Example for A11 and A12: https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-unveiling-the-silicon-secrets/3

"This is just a 5% boost in frequency in ST applications."

Interesting!

chucker23n1 · Feb 24, 2021

Maximara said:
And yet that has been Intel's solution for a long while - just throw more cores on the chip.

Hm? If anything, a criticism of recent Intel chips is that they have too few cores, e.g. compared to AMD.

leman · Feb 24, 2021

chucker23n1 said:
"This is just a 5% boost in frequency in ST applications."

Interesting!

That's all you need if your max performance power consumption is lower than Intel's power consumption at base frequency

To be honest, I believe that Apple is holding the single-threaded performance of M1 a bit, I expect the larger Macs to have slightly higher clocks (maybe around 3.5-3.8ghz). That would increase the max per-core power consumption somewhere around 10-15 watts, which is still very good in comparison, but at the same time it will outperform anything Intel or AMD can offer for the next two years.

chucker23n1 · Feb 24, 2021

leman said:
That's all you need if your max performance power consumption is lower than Intel's power consumption at base frequency

To be honest, I believe that Apple is holding the single-threaded performance of M1 a bit, I expect the larger Macs to have slightly higher clocks (maybe around 3.5-3.8ghz). That would increase the max per-core power consumption somewhere around 10-15 watts, which is still very good in comparison, but at the same time it will outperform anything Intel or AMD can offer for the next two years.

Hm, the M1 at 3.8 GHz vs. 3.2 GHz, scaling linearly, would score about 2,000 at Geekbench. Rocket Lake-S seems to be scoring around 1,900. Now, on the one hand, that's at much higher power consumption, but OTOH, there are a few years to go. It'll be interesting to see what Alder Lake (which adds heterogenous cores) is like.

Intel Downplays Apple's M1 Chip With 'Carefully Crafted' Benchmarks

macrumors 68040

macrumors Core

macrumors 68000

Suspended

macrumors 68000

macrumors 68000

macrumors 68000

Suspended

macrumors G3

macrumors 68000

macrumors G3

macrumors 68000

macrumors 68000

Suspended

macrumors Core

macrumors G3

macrumors Core

macrumors 68000

macrumors G3

macrumors 68000

macrumors Core

macrumors G3

macrumors G3

macrumors Core

macrumors G3

Our Staff