Even if it is true that it's optimised for x86 instructions (which is an ongoing debate), it still represent real world performance since 90% of the global apps & games are developed and programmed for x86 architecture at their core (then some ported to ARM).
Not really. Just because a lot code are developed and tested on x86 doesn’t mean that it is particularly optimized for x86. Most code does not use any architecture-specific features and is written in higher-level languages that only make some basic assumptions about the target CPU. For most of general-purpose software there is not much difference between c86 and ARM64: both have the same basic data sizes and alignments and use the same basic execution principles with instructions, threads, memory organisation etc.
The following is very simplified of course, but generally there are three general cases when developers start looking at low-level CPU features. One is if you need to do something very specific and your CPU of interest might offers specialized features for doing it better. Examples are arbitrary precision integers, cryptography operations, stuff like that. If your CPU has special instructions for that, you can make use of them bit of course they won’t work on a different CPU. That different CPU might instead offer a different kind of instruction to solve the problem, so if you want to support it you’d need to write a new algorithm specially for that CPU. Second case is when you are doing a lot of specialized calculations, and I mean A LOT of them. Then you might want to make use of the high-performance computing features your CPU offers, e.g. vector instructions and other stuff. This for example applies to various mathematical libraries, ray tracing frameworks like Embree (which Cinebebch uses) etc. And finally, the third case are various low-level libraries that require intimate understanding of the CPU architecture to even work correctly (e.g. multi-threading stuff).
Programs that fall into those cases are either quite rare or use third party libraries, many of which are fully optimized for Apple Silicon by now. The performance will depend on domain. As I’ve said, Apple is generally in disadvantage when it comes to throughput-oriented SIMD code, so if that’s the application you care about, a newer x86 CPU might be a better choice for you.
1) I think it would be interesting to put some numbers to this. In particular, if we take GB5 as a CPU benchmark that is relatively neutral with respect to platform then, based on GB5 scores for M1 and M2, and GB5 + CBR23 scores for the latest Intel and AMD CPU's, we could compute what the equivalent CBR23 score should be for M1 and M2, and thus, based on their actual scores, see if there is a consistent correction factor that could be applied (or, alternately, simply see what the actual % penalty is).
Or, absolutely. Shouldn’t be too difficult. i think I’ve even pulled some numbers in some other thread showing that CB23 advantage x86 has does not show itself in GB5 or SPEC2017. But it’s late here and I’m not in the mood to go digging now. I’ll have a look sometime later if I have time or maybe one of the nice contributors here will feel inspired
🙂
2) Any thoughts on
@Xiao_Xi 's post about Embree?
If Apple implemented a dedicated NEON path for Embree they might be able to get another 10-15%, who knows. But what’s the incentive? This is hard, tedious work and not very useful. Apples effort for ray tracing is the GPU, and that’s where they are investing the bulk of resources. As I’ve said, I doubt that Apple will outperform a modern x86 CPU in many SIMD based workloads. They’d have to sacrifice energy efficiency for that and will get very little in return. Makes much more sense to use dedicated coprocessors for high-performance compute such as AMX or the GPU.