MacBook Pro 16-inch M1 Max Stockfish Benchmarks

Leifi · Dec 8, 2021

cmaier said:
yeah, they suck except their numbers are better than Ryan 9 4900H, after only a couple weeks (vs years) optimizing, with more optimizations already identified but not yet implemented.

You’re correct other than that.

Better than 4900H .. LMA0..M1 does less than half nps than even an 4800...with M1 special optimizations. the M1 is really the emperors new clothes in terms of CPU...

cmaier · Dec 8, 2021

Leifi said:
Better than 4900H .. LMA0..M1 does less than half nps than even an 4800...with M1 special optimizations. the M1 is really the emperors new clothes in terms of CPU...

What are you talking about? Leman posted the actual numbers, and it’s 8% better.

Leifi · Dec 8, 2021

Sanpete said:
I like benchmarks. Since I don't play games on my computer (much less games where the machine does the thinking for me), gaming benchmarks are only a curiosity, though I've sometimes done them for people who have a more direct interest in them. People who grew up on Earth and understand insults aren't the best way to ask for help.

For popular computer games, the RTX 3080 generally smokes Apple chips, mainly because it's the kind of hardware the games are written for. I made that plain in a general way in my first post in this thread. It's such a long thread, you probably missed it.

Laptops with chips like the 3080 are also useful as space heaters and white noise generators, whereas this one generally remains at most merely warm under load, is too quiet to be useful, and doesn't run twice as well plugged in as on battery.

So what you are saying is that you only care about gaming performance benches, but you would rather have sucky gaming performance than fans and shorter battery stamina?

So which gaming benchmark do you find relevant then if those are the only benchmarks you care about?

JimmyjamesEU · Dec 8, 2021

cmaier said:
What are you talking about? Leman posted the actual numbers, and it’s 8% better.

You’re surprised that someone who avoids the majority of tests showing the M1 as a great performer avoids the result showing the M1 performing well in his last hope benchmark?

Leifi · Dec 8, 2021

cmaier said:
What are you talking about? Leman posted the actual numbers, and it’s 8% better.

What are you blabbering about... show me those benches..

4900H 16threads does +30mnps, avx2 optmized with full power on linux.. benches on M1 with optimizations seems to be like 10-15Mnps on 8threads...

jeanlain · Dec 8, 2021

cmaier said:
What are you talking about? Leman posted the actual numbers, and it’s 8% better.

I'm not sure what numbers to believe. On the other thread, Sopel says that the M1 is only as fast as an old intel 6700k, using the latest patch.

jdb8167 · Dec 8, 2021

jeanlain said:
I'm not sure what numbers to believe. On the other thread, Sopel says that the M1 is only as fast as an old intel 6700k, using the latest patch.

Different SoCs. @leman is talking about the M1 Pro/Max. Given the title on the thread, that’s appropriate.

jeanlain · Dec 8, 2021

jdb8167 said:
Different SoCs. @leman is talking about the M1 Pro/Max. Given the title on the thread, that’s appropriate.

Yes, but if the M1 matches an old 4-core CPU, I don't see how the M1 Pro/Max, which has the same per-core performance, can match a modern 8-core CPU.

That said, the numbers appear to come from http://ipmanchess.yolasite.com/amd--intel-chess-bench-stockfish.php where the 6700k does very well for a 8-thread CPU.

Not sure how the 4900H is supposed to to yield >30 mnps. Only desktop CPUs achieve this score.

Sanpete · Dec 8, 2021

Leifi said:
So what you are saying is that you only care about gaming performance benches, but you would rather have sucky gaming performance than fans and shorter battery stamina?

So which gaming benchmark do you find relevant then if those are the only benchmarks you care about?

Wow. I'm afraid English has completely failed us again. If you catch up, let me know.

leman · Dec 8, 2021

Leifi said:
4900H 16threads does +30mnps, avx2 optmized with full power on linux.. benches on M1 with optimizations seems to be like 10-15Mnps on 8threads...

4900H reaches 15mnps according to http://ipmanchess.yolasite.com/amd--intel-chess-bench-stockfish.php

Where did you get your 30nmps from? That's enthusiast desktop level.

jeanlain · Dec 8, 2021

leman said:
4900H reaches 15mnps according to http://ipmanchess.yolasite.com/amd--intel-chess-bench-stockfish.php

Where did you get your 30nmps from? That's enthusiast desktop level.

Note that Leifi said "full power". I suspect some weird unconstrained setup in which the 4900H would allow to fry an egg on the palm rest.

Leifi · Dec 8, 2021

cmaier said:
What are you talking about? Leman posted the actual numbers, and it’s 8% better.

I looked at Leemans post..

To compare benches it is important that they are run under similar conditions 🙂

1) the ipman test is most likely a windows machine running the downloaded exe not optimized for the machine.
2) the "M1" figures cited by Leman are for the "M1 MAX" running on all P+E cores.. and it still dont get more than 16Mnps.
3) An 59xxH or HS does 30Mnps without even compiling, using largepages or optimizing just download and run..

P.S: Of course 4800 etc. values will also vary much based on different models, memory etc as well.

cmaier · Dec 8, 2021

jeanlain said:
Yes, but if the M1 matches an old 4-core CPU, I don't see how the M1 Pro/Max, which has the same per-core performance, can match a modern 8-core CPU.

More memory bandwidth.

leman · Dec 8, 2021

Leifi said:
3) An 59xxH or HS does 30Mnps without even compiling, using largepages or optimizing just download and run..

And where do you get these numbers from?

Krevnik · Dec 8, 2021

Leifi said:
I looked at Leemans post..

To compare benches it is important that they are run under similar conditions 🙂

1) the ipman test is most likely a windows machine running the downloaded exe not optimized for the machine.
2) the "M1" figures cited by Leman are for the "M1 MAX" running on all P+E cores.. and it still dont get more than 16Mnps.
3) An 59xxH or HS does 30Mnps without even compiling, using largepages or optimizing just download and run..

P.S: Of course 4800 etc. values will also vary much based on different models, memory etc as well.

I agree the conditions are important, but it's also important to point out that the figures posted (since I was the one who ran the test) were used with the compile instructions given on the project's GitHub page. Nothing about the machine was "tweaked" from stock Apple to try to skew things. I have better things to do with my time, as the whole experiment was done because I had a couple spare minutes to run things. A whopping 5 minutes was spent collecting those numbers. In this instance, the goal was to try out a new set of optimizations added just yesterday that improves NEON SIMD support for ARM all up. Even the Raspberry Pi got a boost from the changes.

I don't even know what you are getting at with "not optimized for the machine". You don't optimize code for specific machines, but for architectures. The optimizations we are talking about are SIMD optimizations to accelerate certain types of math. Those are available quite broadly, and part of the puzzle as to what is going on with Stockfish in particular. Especially since ARM and x64 have different SIMD engines, which require different implementations to take advantage of.

I'm not sure how using the P+E cores is any different than letting Alder Lake use all its cores.

Leifi · Dec 8, 2021

leman said:
And where do you get these numbers from?

I have access to laptops with 5800 and 5900 and also some 5950 I have run bench 1024 16 26 on those on various versions and networks of SF.

JimmyjamesEU · Dec 8, 2021

https://forums.macrumors.com/thread...mance-on-m1-by-up-to-80.2326552/post-30683532

Nothing to see here, just one of the devs of stockfish saying the M1 cores perform on par with high end x86 chips.

im sure others know more with their secret benchmarks.

Leifi · Dec 8, 2021

Krevnik said:
I agree the conditions are important, but it's also important to point out that the figures posted (since I was the one who ran the test) were used with the compile instructions given on the project's GitHub page. Nothing about the machine was "tweaked" from stock Apple to try to skew things. I have better things to do with my time, as the whole experiment was done because I had a couple spare minutes to run things. A whopping 5 minutes was spent collecting those numbers. In this instance, the goal was to try out a new set of optimizations added just yesterday that improves NEON SIMD support for ARM all up. Even the Raspberry Pi got a boost from the changes.

I don't even know what you are getting at with "not optimized for the machine". You don't optimize code for specific machines, but for architectures. The optimizations we are talking about are SIMD optimizations to accelerate certain types of math. Those are available quite broadly, and part of the puzzle as to what is going on with Stockfish in particular. Especially since ARM and x64 have different SIMD engines, which require different implementations to take advantage of.

I'm not sure how using the P+E cores is any different than letting Alder Lake use all its cores.

I really appreciate your time spent.. I have been bit*ing alot in various thread that no one was willing to spend 5 minutes to bench 🙂

I am just saying that many of the ipman uploads are from people who sent him the benches and may or may not even have been compiled on that machine tthat ran the bench, and various OS end settings, wattage, clocking, energymodes etc used overall.

If you download an exe it have been compiled for som generic versions like AVX2 or BMI. If you compile yourself you will get the best compile for your machines architecture and can use better or worse compilers and options. Usually Linux versions run better than windows-compiles etc.. I do not have a 4900H CPU to test. I am relying what people have reported to me running benches on various laptops with 4800 and 4900. I do however have access to 59xx versions.

To your question on AlderLake.. I agree completely.. To compare benches the CPU should be used to its fullest with all P+E cores.. The problem is that many benches seem to want to misrepresent other CPUs by not using optimal number of threads for each CPU when benching. I expect the laptops now showing up with 14-core aldreLake to crush both 50xx and M1 MAx in SF benches.

leman · Dec 8, 2021

Leifi said:
I have access to laptops with 5800 and 5900 and also some 5950 I have run bench 1024 16 26 on those on various versions and networks of SF.

I just fine it very unlikely that your mobile Zen3 performs on par with a desktop 12700K. Are you sure that your quoted 30mnodes/sec is not the 5950X? That would make sense to me core-wise.

Leifi · Dec 8, 2021

leman said:
I just fine it very unlikely that your mobile Zen3 performs on par with a desktop 12700K. Are you sure that your quoted 30mnodes/sec is not the 5950X? That would make sense to me core-wise.

Yeah.. I just ran a test on a laptop here (5900HX) 8-core..

using 14.1 st avx2 i get:

AMD Ryzen 9 5900HX with Radeon Graphics, 3301 Mhz, 8 Core(s), 16 Logical Processor(s)
stockfish_14.1_avx.exe bench 1024 16 24
===========================
Total time (ms) : 66452
Nodes searched : 2126455531
Nodes/second : 31999873

compares pretty favorably to Krevnicks bench

M1 MAX -
/stockfish bench 1024 10 26 default depth nnue
===========================
Total time (ms) : 76914
Nodes searched : 1192240595
Nodes/second : 15500956

Leifi · Dec 8, 2021

leman said:
I just fine it very unlikely that your mobile Zen3 performs on par with a desktop 12700K.

Maybe SF needs further optimizations for alder-lake 🙃

JimmyjamesEU · Dec 8, 2021

Leifi said:
Maybe SF needs further optimizations for alder-lake 🙃

Impossible. You can't just expect optimisation after all these years of devs working on x86 platforms. That might be possible on a new platform, but not on Alder Lake. At most I would expect 0.0001% improvement possible. It's to be expected for these over-hyped and over-heated chips.

Leifi · Dec 8, 2021

JimmyjamesEU said:
Impossible. You can't just expect optimisation after all these years of devs working on x86 platforms. That might be possible on a new platform, but not on Alder Lake. At most I would expect 0.0001% improvement possible. It's to be expected for these over-hyped and over-heated chips.

<irony>No at least 2x-3x times performance can be gained if just the programmers make some effort and use intels native libs.. </irony>

JimmyjamesEU · Dec 8, 2021

Leifi said:
<irony>No at least 2x-3x times performance can be gained if just the programmers make some effort and use intels native libs.. </irony>

Possible if optimisation is just starting, and little work had been done. Absolutely impossible on a platform that’s been around for decades, and has had years of work done to improve performance.

Leifi · Dec 8, 2021

JimmyjamesEU said:
Possible if optimisation is just starting, and little work had been done. Absolutely impossible on a platform that’s been around for decades, and has had years of work done to improve performance.

Alderlake is newer than Apple Silicon.. I think you are confusing CPU architecture with OP-codes.. You could apply same reasoning to M1, that it is the same as an 1987 Acorn and can't be optimized for further 🙂

MacBook Pro 16-inch M1 Max Stockfish Benchmarks

macrumors regular

Suspended

macrumors regular

Suspended

macrumors regular

macrumors 68030

macrumors 601

macrumors 68030

macrumors 68040

macrumors Core

macrumors 68030

macrumors regular

Suspended

macrumors Core

macrumors 601

macrumors regular

Suspended

macrumors regular

macrumors Core

macrumors regular

macrumors regular

Suspended

macrumors regular

Suspended

macrumors regular

Our Staff