Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
It's an insulting mischaracteriszation to say these serious users are simply looking for benchmark bragging rights. The GB scores are just a shorthand for overall performance.

And it's been shown a number of times that AS finishes the GB "benchmark" faster than it can spin up all it's cores in the multicore benchmark - hence why it "looks" like there's poor scaling.

So, I guess in the GB "benchmark", this is indicative of the type of performance you would see "in the real world" if those "real world tasks" also finished before AS span up all it's cores.

The other view that can be taken is that it's not an ideal benchmark to show it's true potential performance as its "tests" were introduced before AS hit the market. Newer Intel CPU's don't show this discrepancy of course because they are running on essentially the same architecture as the older versions.
 
And it's been shown a number of times that AS finishes the GB "benchmark" faster than it can spin up all it's cores in the multicore benchmark - hence why it "looks" like there's poor scaling.

So, I guess in the GB "benchmark", this is indicative of the type of performance you would see "in the real world" if those "real world tasks" also finished before AS span up all it's cores.

The other view that can be taken is that it's not an ideal benchmark to show it's true potential performance as its "tests" were introduced before AS hit the market. Newer Intel CPU's don't show this discrepancy of course because they are running on essentially the same architecture as the older versions.
My interchange with that other poster was entirely about single-core CPU performance. Your criticisms are thus not relevant to what I wrote. It sounds like you didn't bother to read my comments before disagreeing with them.

As to the broader question of whether GB is (unlike Cinebench) a reasonable benchmark for cross-platform CPU performance comparisions of Intel, AMD, and AS, this article by Ram Srinivasan of NUVIA indicates excellent correlation between GB and SPEC 2006 INT when comparing Intel and AS (they didn't test SPEC FP), as well as excellent correlation in comparing AMD and Intel:

1670802914682.png



The only subscore with which you might see an issue is the crypto subscore, if you are comparing it against AVX-enabled chipsets (which the Ryzen 7 is, but AS and the Intel Core-i# Gen 13 are not).
 
Last edited:
And it's been shown a number of times that AS finishes the GB "benchmark" faster than it can spin up all it's cores in the multicore benchmark - hence why it "looks" like there's poor scaling.

So, I guess in the GB "benchmark", this is indicative of the type of performance you would see "in the real world" if those "real world tasks" also finished before AS span up all it's cores.

The other view that can be taken is that it's not an ideal benchmark to show it's true potential performance as its "tests" were introduced before AS hit the market. Newer Intel CPU's don't show this discrepancy of course because they are running on essentially the same architecture as the older versions.
That’s the issue with these benchmarks. All it shows is how well it uses geekbench. It doesn’t tell me how well my Final Cut Pro experience will be. Example is my 2019 i9 iMac beat my base M1 Mac mini in geekbench. But that M1 Mac mini smoked my iMac in Final Cut Pro.

And how do we know it’s testing properly? As you said there is one known issue. Could there be more?
 
That’s the issue with these benchmarks. All it shows is how well it uses geekbench. It doesn’t tell me how well my Final Cut Pro experience will be. Example is my 2019 i9 iMac beat my base M1 Mac mini in geekbench. But that M1 Mac mini smoked my iMac in Final Cut Pro.

And how do we know it’s testing properly? As you said there is one known issue. Could there be more?
GeekBench is an adequate test of the performance of CORES and CACHE/DRAM.
It is not (and does not pretend to be) a test of other features of the SoC, whether NoC, synchronization, GPU, media engines, or a variety of other things that are relevant to, eg, Final Cut Pro.

That doesn't make it useless for people who understand what it is doing and how to interpret it; it just makes it useless for YOU.
 
Point is, with the rising energy costs and wanting to do something good for the environment, whilst still being fairly equally productive, these chips matter. They’ll let you do the same work with much less power consumption, if everyone did this then we’d be doing the planet a favour…
 
GeekBench is an adequate test of the performance of CORES and CACHE/DRAM.
It is not (and does not pretend to be) a test of other features of the SoC, whether NoC, synchronization, GPU, media engines, or a variety of other things that are relevant to, eg, Final Cut Pro.

That doesn't make it useless for people who understand what it is doing and how to interpret it; it just makes it useless for YOU.
As I said. All it does is tell me how well a system runs geekbench. I don’t run that all day every day.

And like I said what has been mentioned, it’s not even programmed fully to utilize the M1/M2 SoC. It doesn’t stress it enough to show accurate results. So again, we need to base this on how well the program is optimized. Too many variables here. Thus - it just shows how well a computer runs Geekbench. Nothing more than that.
 
As I said. All it does is tell me how well a system runs geekbench. I don’t run that all day every day.

And like I said what has been mentioned, it’s not even programmed fully to utilize the M1/M2 SoC. It doesn’t stress it enough to show accurate results. So again, we need to base this on how well the program is optimized. Too many variables here. Thus - it just shows how well a computer runs Geekbench. Nothing more than that.
I think there's a middle ground between the two extremes of saying that GB is useless except for measuring GB, and that GB will tell you how fast your computer will run a given task. Neither are correct.

Someone more familiar with GB can correct me but, as I understand it, GB is written to run without penalty on AS, but does not take advantage of any of the speciality coprocessors Apple offers. Thus suppose you have a dozen CPU-bound single-threaded program that are well-written for both Windows and AS, but that do not take advantage of any special optimizations; most programs I use probably fall into that category. For those, the GB SC CPU scores should serve as a good rough measure of the average performance differences you will see.*

I.e., simply put, GB doesn't tell you how fast your (CPU-bound ST) program will run under AS vs Intel. Rather (unless your program uses special optimizations), it tells you about what the relative speed should be if it's properly written for both platforms.

Thus, to the extent you see a CPU-bound ST program significantly under-performing on AS relative to Intel (relative to the difference in GB SC performance), as is the case with Mathematica, it tells you that it's not yet well-written for AS.

*There's also the complication of the extent to which both GB and your programs use vectorized instructions for Intel vs. AS, but I can't speak to that in detail.
 
Last edited:
My interchange with that other poster was entirely about single-core CPU performance. Your criticisms are thus not relevant to what I wrote. It sounds like you didn't bother to read my comments before disagreeing with them.

As to the broader question of whether GB is (unlike Cinebench) a reasonable benchmark for cross-platform CPU performance comparisions of Intel, AMD, and AS, this article by Ram Srinivasan of NUVIA indicates excellent correlation between GB and SPEC 2006 INT when comparing Intel and AS (they didn't test SPEC FP), as well as excellent correlation in comparing AMD and Intel:

View attachment 2126857


The only subscore with which you might see an issue is the crypto subscore, if you are comparing it against AVX-enabled chipsets (which the Ryzen 7 is, but AS and the Intel Core-i# Gen 13 are not).

Is there any reason SPEC2006 was being used in preference to SPEC2017? Why wasn't FP compared?

Anandtech has an interesting breakdown of the M1 MAX which includes SCPEC2017 - https://www.anandtech.com/show/17024/apple-m1-max-performance-review/5


I suspect it's because of the finding "In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of."
 
Is there any reason SPEC2006 was being used in preference to SPEC2017? Why wasn't FP compared?

Anandtech has an interesting breakdown of the M1 MAX which includes SCPEC2017 - https://www.anandtech.com/show/17024/apple-m1-max-performance-review/5


I suspect it's because of the finding "In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of."
I don't know why the author didn't also look at SPEC2017. It could just be the author didn't have access to that program when he did the AS comparison, since he uses SPEC2017 elsewhere—or it could be he didn't want to bother with the additional comparison, since AS wasn't the focus of this article.

The more interesting question, I think, is why he didn't compare FP. You might wish to email the author and ask. It would be interesting to do the same comparison for FP.
 
Is there any reason SPEC2006 was being used in preference to SPEC2017? Why wasn't FP compared?

Anandtech has an interesting breakdown of the M1 MAX which includes SCPEC2017 - https://www.anandtech.com/show/17024/apple-m1-max-performance-review/5


I suspect it's because of the finding "In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of."
SPEC2017 uses a lot more memory than SPEC2006, so it has only recently been practical to run it on phones. If you want to compare backwards more than a few years, or including low-end phones, SPEC2006 is more practical.

As for FP, Nuvia's discussion was by an architect for architects. Getting a higher FP number is substantially easier than getting a higher integer number; it just costs area.
Single-threaded integer performance is the gold standard for showing that your CPU is world-class; it's the hardest thing to improve. Concentrating on it and boasting about it show Nuvia is serious.
(The corollary also holds. Every six months some company pops up, yesterday it was Ventana, to claim that they are the new CPU champions. Weeding out the frauds is very simple -- if they do not LEAD with their single-threaded integer performance, they are frauds...)
 
SPEC2017 uses a lot more memory than SPEC2006, so it has only recently been practical to run it on phones. If you want to compare backwards more than a few years, or including low-end phones, SPEC2006 is more practical.

As for FP, Nuvia's discussion was by an architect for architects. Getting a higher FP number is substantially easier than getting a higher integer number; it just costs area.
Single-threaded integer performance is the gold standard for showing that your CPU is world-class; it's the hardest thing to improve. Concentrating on it and boasting about it show Nuvia is serious.
(The corollary also holds. Every six months some company pops up, yesterday it was Ventana, to claim that they are the new CPU champions. Weeding out the frauds is very simple -- if they do not LEAD with their single-threaded integer performance, they are frauds...)

Thanks for the reply, I appreciate your time. My reply was more related to the fact that Nuvia claim GB scores relate closely with SPEC2006 and the discussion was around MacBooks/ desktops, not phones. I was curious why they specifically chose SPECint2006 and not 2017 (which has been out for a while and is easily ran on laptops and desktops) and why SPECfp2017 wasn't used as well.

I'm guessing the later libraries didn't show the correlation that they were hoping for, especially in FP where AS is particularly good. It's also interesting that SPECint/ fp2017 shows almost perfect MT scaling which isn't borne out on the GB benchmark as has been shown a number of times.
 
Thanks for the reply, I appreciate your time. My reply was more related to the fact that Nuvia claim GB scores relate closely with SPEC2006 and the discussion was around MacBooks/ desktops, not phones. I was curious why they specifically chose SPECint2006 and not 2017 (which has been out for a while and is easily ran on laptops and desktops) and why SPECfp2017 wasn't used as well.

I'm guessing the later libraries didn't show the correlation that they were hoping for, especially in FP where AS is particularly good. It's also interesting that SPECint/ fp2017 shows almost perfect MT scaling which isn't borne out on the GB benchmark as has been shown a number of times.
Well you would guess wrong. As any scan of Anandtech will show. (They have a few years of history tracking SPEC2006, SPEC2017, and GB5 results for the successive iPhones, and you can use whatever regression SW you like to see the lines.)

As for scaling, I have no idea what numbers you are referring. If you look at AndreiF's numbers ( https://www.anandtech.com/show/17024/apple-m1-max-performance-review ) for M1, you will see about a 4.5x scaling for M1 (4+4) cores. This is basically identical to the 4.4x scaling you see for GB5.

In both cases, in a perfect world, you might hope for around 5 to 5.2x scaling (E-core at around 1/4 to 1/3rd of a P-core) and we get that sort of order of magnitude, but with some inefficiency. I haven't investigated the issue closely enough to claim which of the obvious likely candidates causes the bulk of the slowdown (sharing caches? congestion on the NoC? congestion at the memory controller? slightly reduced clocks as more cores are powered up?) but it's what one expects – you see exactly the same sort of slowdowns (by about the same sort of factor, for the same sorts of reasons) on Intel at this number of cores.
(For Intel SMT gives you approx the equivalent of about 1/4 to 1/3 of a core, so it's somewhat the same, though obviously the analogies are imperfect.
If you want to throw in an Alder Lake design it gets more complicated, but since AndreiF left, there aren't any SPEC results for Alder Lake on anandtech anyway...)

Apple's scaling will probably get better (as I've said repeatedly, M1 was their first effort where scaling really matters, and their are a number of expected improvements that we'll see in the next generation or two; these will be more important at the high level, but will probably help a little even at this lower level.)
 
Anandtech has an interesting breakdown of the M1 MAX which includes SCPEC2017 - https://www.anandtech.com/show/17024/apple-m1-max-performance-review/5


I suspect it's because of the finding "In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of."

I'm always a bit bummed that we don't see more points of comparison. How does this compare, for example to Alder Lake-U and Alder Lake-H (given that Apple's wattage is somewhere in the middle of those)?

(I know; those weren't out yet.)

Maybe when M2 Max comes out (assuming that even exists; still see a bit of a chance that they'll skip it in favor of M3 Max), we'll see a comparison to Raptor Lake.
 
I'm always a bit bummed that we don't see more points of comparison. How does this compare, for example to Alder Lake-U and Alder Lake-H (given that Apple's wattage is somewhere in the middle of those)?

(I know; those weren't out yet.)

Maybe when M2 Max comes out (assuming that even exists; still see a bit of a chance that they'll skip it in favor of M3 Max), we'll see a comparison to Raptor Lake.
Given that AndreiF is no longer at AnandTech, and no-one there seems interested in his style of reviewing, this seems unlikely :-(


Chips and Cheese is the best alternative these days to AnandTech reviews, but suffers from
- lack of money (so they never review anything new unless someone buys it or lends it. MAYBE this will change and eg Apple will lend them review units for a month?)

- much more interest in x86 than ARM. Andrei was an x86 fan, but was also honest enough with himself to see where ARM and Apple had strengths, and to investigate these in detail

- much less interest in "alternative" (ie do things differently from Intel) micro-architectures. AndreiF would notice an unusual performance result in ARM or Apple, think "that's funny", and try to track down what was really going on, if necessary writing new benchmarks.
Chips and Cheese is more that they may notice an unexpected result, but then they'll do zero investigation of why. Compare with the way Phoronix is basically useless for UNDERSTANDING anything (SW or HW) because all Michael does is report results; zero analysis, zero tracking down and understanding of anomalous performance.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.