Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
yeah, they suck except their numbers are better than Ryan 9 4900H, after only a couple weeks (vs years) optimizing, with more optimizations already identified but not yet implemented.

You’re correct other than that.

Better than 4900H .. LMA0..M1 does less than half nps than even an 4800...with M1 special optimizations. the M1 is really the emperors new clothes in terms of CPU...
 
  • Like
Reactions: Appletoni
I like benchmarks. Since I don't play games on my computer (much less games where the machine does the thinking for me), gaming benchmarks are only a curiosity, though I've sometimes done them for people who have a more direct interest in them. People who grew up on Earth and understand insults aren't the best way to ask for help.

For popular computer games, the RTX 3080 generally smokes Apple chips, mainly because it's the kind of hardware the games are written for. I made that plain in a general way in my first post in this thread. It's such a long thread, you probably missed it.

Laptops with chips like the 3080 are also useful as space heaters and white noise generators, whereas this one generally remains at most merely warm under load, is too quiet to be useful, and doesn't run twice as well plugged in as on battery.

So what you are saying is that you only care about gaming performance benches, but you would rather have sucky gaming performance than fans and shorter battery stamina?

So which gaming benchmark do you find relevant then if those are the only benchmarks you care about?
 
What are you talking about? Leman posted the actual numbers, and it’s 8% better.

What are you blabbering about... show me those benches..

4900H 16threads does +30mnps, avx2 optmized with full power on linux.. benches on M1 with optimizations seems to be like 10-15Mnps on 8threads...
 
Last edited:
  • Like
Reactions: Appletoni
What are you talking about? Leman posted the actual numbers, and it’s 8% better.
I'm not sure what numbers to believe. On the other thread, Sopel says that the M1 is only as fast as an old intel 6700k, using the latest patch.
 
  • Like
Reactions: Appletoni
I'm not sure what numbers to believe. On the other thread, Sopel says that the M1 is only as fast as an old intel 6700k, using the latest patch.
Different SoCs. @leman is talking about the M1 Pro/Max. Given the title on the thread, that’s appropriate.
 
  • Like
Reactions: Sanpete
Different SoCs. @leman is talking about the M1 Pro/Max. Given the title on the thread, that’s appropriate.
Yes, but if the M1 matches an old 4-core CPU, I don't see how the M1 Pro/Max, which has the same per-core performance, can match a modern 8-core CPU.

That said, the numbers appear to come from http://ipmanchess.yolasite.com/amd--intel-chess-bench-stockfish.php where the 6700k does very well for a 8-thread CPU.

Not sure how the 4900H is supposed to to yield >30 mnps. Only desktop CPUs achieve this score.
 
  • Like
Reactions: Appletoni
So what you are saying is that you only care about gaming performance benches, but you would rather have sucky gaming performance than fans and shorter battery stamina?

So which gaming benchmark do you find relevant then if those are the only benchmarks you care about?
Wow. I'm afraid English has completely failed us again. If you catch up, let me know.
 
What are you talking about? Leman posted the actual numbers, and it’s 8% better.

I looked at Leemans post..

To compare benches it is important that they are run under similar conditions :)

1) the ipman test is most likely a windows machine running the downloaded exe not optimized for the machine.
2) the "M1" figures cited by Leman are for the "M1 MAX" running on all P+E cores.. and it still dont get more than 16Mnps.
3) An 59xxH or HS does 30Mnps without even compiling, using largepages or optimizing just download and run..

P.S: Of course 4800 etc. values will also vary much based on different models, memory etc as well.
 
Last edited:
  • Like
Reactions: Appletoni
I looked at Leemans post..

To compare benches it is important that they are run under similar conditions :)

1) the ipman test is most likely a windows machine running the downloaded exe not optimized for the machine.
2) the "M1" figures cited by Leman are for the "M1 MAX" running on all P+E cores.. and it still dont get more than 16Mnps.
3) An 59xxH or HS does 30Mnps without even compiling, using largepages or optimizing just download and run..

P.S: Of course 4800 etc. values will also vary much based on different models, memory etc as well.

I agree the conditions are important, but it's also important to point out that the figures posted (since I was the one who ran the test) were used with the compile instructions given on the project's GitHub page. Nothing about the machine was "tweaked" from stock Apple to try to skew things. I have better things to do with my time, as the whole experiment was done because I had a couple spare minutes to run things. A whopping 5 minutes was spent collecting those numbers. In this instance, the goal was to try out a new set of optimizations added just yesterday that improves NEON SIMD support for ARM all up. Even the Raspberry Pi got a boost from the changes.

I don't even know what you are getting at with "not optimized for the machine". You don't optimize code for specific machines, but for architectures. The optimizations we are talking about are SIMD optimizations to accelerate certain types of math. Those are available quite broadly, and part of the puzzle as to what is going on with Stockfish in particular. Especially since ARM and x64 have different SIMD engines, which require different implementations to take advantage of.

I'm not sure how using the P+E cores is any different than letting Alder Lake use all its cores.
 
I agree the conditions are important, but it's also important to point out that the figures posted (since I was the one who ran the test) were used with the compile instructions given on the project's GitHub page. Nothing about the machine was "tweaked" from stock Apple to try to skew things. I have better things to do with my time, as the whole experiment was done because I had a couple spare minutes to run things. A whopping 5 minutes was spent collecting those numbers. In this instance, the goal was to try out a new set of optimizations added just yesterday that improves NEON SIMD support for ARM all up. Even the Raspberry Pi got a boost from the changes.

I don't even know what you are getting at with "not optimized for the machine". You don't optimize code for specific machines, but for architectures. The optimizations we are talking about are SIMD optimizations to accelerate certain types of math. Those are available quite broadly, and part of the puzzle as to what is going on with Stockfish in particular. Especially since ARM and x64 have different SIMD engines, which require different implementations to take advantage of.

I'm not sure how using the P+E cores is any different than letting Alder Lake use all its cores.


I really appreciate your time spent.. I have been bit*ing alot in various thread that no one was willing to spend 5 minutes to bench :)

I am just saying that many of the ipman uploads are from people who sent him the benches and may or may not even have been compiled on that machine tthat ran the bench, and various OS end settings, wattage, clocking, energymodes etc used overall.

If you download an exe it have been compiled for som generic versions like AVX2 or BMI. If you compile yourself you will get the best compile for your machines architecture and can use better or worse compilers and options. Usually Linux versions run better than windows-compiles etc.. I do not have a 4900H CPU to test. I am relying what people have reported to me running benches on various laptops with 4800 and 4900. I do however have access to 59xx versions.

To your question on AlderLake.. I agree completely.. To compare benches the CPU should be used to its fullest with all P+E cores.. The problem is that many benches seem to want to misrepresent other CPUs by not using optimal number of threads for each CPU when benching. I expect the laptops now showing up with 14-core aldreLake to crush both 50xx and M1 MAx in SF benches.
 
Last edited:
  • Like
Reactions: Appletoni
I have access to laptops with 5800 and 5900 and also some 5950 I have run bench 1024 16 26 on those on various versions and networks of SF.

I just fine it very unlikely that your mobile Zen3 performs on par with a desktop 12700K. Are you sure that your quoted 30mnodes/sec is not the 5950X? That would make sense to me core-wise.
 
I just fine it very unlikely that your mobile Zen3 performs on par with a desktop 12700K. Are you sure that your quoted 30mnodes/sec is not the 5950X? That would make sense to me core-wise.

Yeah.. I just ran a test on a laptop here (5900HX) 8-core..

using 14.1 st avx2 i get:

AMD Ryzen 9 5900HX with Radeon Graphics, 3301 Mhz, 8 Core(s), 16 Logical Processor(s)
stockfish_14.1_avx.exe bench 1024 16 24
===========================
Total time (ms) : 66452
Nodes searched : 2126455531
Nodes/second : 31999873

compares pretty favorably to Krevnicks bench

M1 MAX -
/stockfish bench 1024 10 26 default depth nnue
===========================
Total time (ms) : 76914
Nodes searched : 1192240595
Nodes/second : 15500956
 
  • Like
Reactions: Appletoni
Maybe SF needs further optimizations for alder-lake ?
Impossible. You can't just expect optimisation after all these years of devs working on x86 platforms. That might be possible on a new platform, but not on Alder Lake. At most I would expect 0.0001% improvement possible. It's to be expected for these over-hyped and over-heated chips.
 
  • Like
Reactions: Romain_H and Homy
Impossible. You can't just expect optimisation after all these years of devs working on x86 platforms. That might be possible on a new platform, but not on Alder Lake. At most I would expect 0.0001% improvement possible. It's to be expected for these over-hyped and over-heated chips.
<irony>No at least 2x-3x times performance can be gained if just the programmers make some effort and use intels native libs.. </irony>
 
<irony>No at least 2x-3x times performance can be gained if just the programmers make some effort and use intels native libs.. </irony>
Possible if optimisation is just starting, and little work had been done. Absolutely impossible on a platform that’s been around for decades, and has had years of work done to improve performance.
 
Possible if optimisation is just starting, and little work had been done. Absolutely impossible on a platform that’s been around for decades, and has had years of work done to improve performance.
Alderlake is newer than Apple Silicon.. I think you are confusing CPU architecture with OP-codes.. You could apply same reasoning to M1, that it is the same as an 1987 Acorn and can't be optimized for further :)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.