Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Oops a lot of critique. No problem. Right, what can I do? Perhaps something very simple. Simple regarding Rosetta2, which is given a bit too much credit in this thread. Fine, but I hesitate to agree 100%.

Testing the results from 2 different development systems for the Mac.
Results in app1 and app2. Both bumping a counter as fast as they can for 4 seconds within reasonable constraints. Similar interrupt masks. Just 1 thread. No GUI interaction. All in a terminal shell. The apps return the final amount of the counters after 4 seconds.

Originaly it was used to test code generation for Intel processors, between two similar but different dev systems. So two different code generators. Later it was used again to see the result wrt Rosetta2.
Anyone who believes that incrementing a counter for four seconds is a good real-world performance test simply doesn't know what they're doing.

Higher number is better.
tested on 2020 M1 MBP Ventura 13.2 with Rosetta2
app1 76956
app2 5264808

tested on 2018 i7 Mini Ventura 13.2 (without Rosetta2)
app1 15988000
app2 16352180

Brilliant, quite a result for Rosetta2... :(
This is the real world, can't help it.

Sure, AS native versions would certainly give us superior results, but that was not what we tested or wanted to see.

Apple never said Rosetta2 was flawless. Certainly don't blame them. It's just that we create code which doesn't go well with Rosetta2.
Since you didn't provide source code for this bad benchmark, I wrote my own based on your description.

C:
#include <stdio.h>
#include <sys/time.h>
#include <signal.h>
#include <stdlib.h>

unsigned long x;

void signal_handler(int sig) {
    printf("%lU increments\n", x);
    exit(0);
}

int main (int argc, char *argv[]) {
    signal(SIGALRM, &signal_handler);
    signal(SIGINT, &signal_handler);
    
    struct itimerval fourseconds;   
    fourseconds.it_value.tv_sec = 4;
    fourseconds.it_value.tv_usec = 0;
    fourseconds.it_interval = fourseconds.it_value;
    setitimer(ITIMER_REAL, &fourseconds, NULL);

    x = 0;
    while (1) x++;

    return 0;
}

Save this code as 'badbench.c'. Compile with:

clang badbench.c -target arm64-apple-darwin -o badbench.arm64
clang badbench.c -target x86_64-apple-darwin -o badbench.x86_64

If anyone wants to take part, the only prerequisite is that you must have Xcode installed so that clang (Apple's C compiler) is there.

M4 running macOS Sequoia 15.5:
% ./badbench.arm64
16304148129 increments
% ./badbench.x86_64
15657018371 increments


2013 Retina MacBook Pro 15" i7-4850HQ running macOS Big Sur 11.7.10 (last version that can run on a 2013 without OCLP assistance):
% ./badbench.x86_64
2294131307 increments


M4's native and Rosetta results are identical within the margin of error from run to run. This is not a surprising result since a loop that does nothing but increment one integer variable should be an extremely easy thing for Rosetta to translate with no instruction count expansion.

The old Intel MBP is less than a factor of 2 slower in clock speed (2.3 GHz with a turbo limit of 3.5 GHz, vs M4's ~4.4 GHz), but it runs the benchmark at about 1/7 the speed. I doubt this ratio would stay the same in real programs, I'd expect the i7 to be closer, but this really is a bad artificial benchmark.

It's still not clear to me what your complaint about Rosetta is, and your attempt at criticizing its performance is very suspect as far as I'm concerned. Like @dmccloud says - if you're going to claim well known things about Rosetta are all wrong, post code we can run to independently verify what you say.
 
  • Like
Reactions: AshesXY
Basically it comes down to practice.

Apple has had literally DECADES of experience moving its code bases from one processor architecture to the next. The Macintosh was originally on Motorola's 68000 family, then PowerPC, then x86, and now ARM. Undoubtedly they have internally prototyped moves to other architectures to explore possible advantages and disadvantages — DEC Alpha, Intel's Itanium and i860/i960, MIPS, and now RISC V have probably all had versions of Mac operating systems running. Apple might be the best in the industry at this.

I'd add to this that the drive underlying all of this is Apple's desire to control its own destiny, for better or for worse. On the vital CPU front, they had to withstand both dependence on and being humiliated/hamstrung by Motorola (during the 68K era and PPC era), IBM (during the latter PPC era), and Intel during the x86 era. We've also seen that with Apple stressing its own solutions (either developed in-house or acquired) for its market-critical apps over the years, from Xcode, to iLife/iWork to Logic Pro and beyond.

In Apple's ideal world, they are beholden to no outside supplier for the hardware or software it needs to achieve its market goals. It doesn't need to beg Motorola, IBM or Intel for faster or more power efficient CPUs. It doesn't need to beg Microsoft to continue developing Office as it did in the 90s.

The development of Apple Silicon is the culmination of this. Heck, I've read news articles from 2023 saying that Apple started to produce its own LCD screens for devices like the Apple Watch in 2024.

At this rate, I imagine Apple will start producing its own memory and solid state storage chips in the not too distant future.
 
Anyone who believes that incrementing a counter for four seconds is a good real-world performance test simply doesn't know what they're doing.

In the best-case scenario a four second test of any sort would only test burst performance rather than sustained performance. People who are using systems for gaming, photo/video editing, code compilation, CAD, etc. are not working in four second bursts, so such a test can't even be indicative of a real-world workflow scenario.
 
  • Like
Reactions: 123123123
In the best-case scenario a four second test of any sort would only test burst performance rather than sustained performance. People who are using systems for gaming, photo/video editing, code compilation, CAD, etc. are not working in four second bursts, so such a test can't even be indicative of a real-world workflow scenario.
The issues with the idea go far beyond that.

For example, if you compile the code I posted with any level of optimization turned on, clang will figure out that you're not doing anything real with the counter's value, and will optimize away the counter increment to improve performance, causing the program to always report 0 iterations.

This is not a newly discovered problem. Dhrystone MIPS was a synthetic benchmark created in the 1980s which was similar to the counter increment test, though at least it was a suite of many simple tests instead of a single one. Dhrystone's assumption was that if you could measure the speeds of many common programming constructs (loops, procedure calls, variable assignments and so on), and weighted the results by how frequently each construct was used in real programs, you'd be indirectly measuring the performance of essentially all programs.

Dhrystone has been discredited as a serious benchmark for high performance computers for a very long time. Just like the counter incrementer, lots of it can be optimized away, so you have to defeat compiler optimizations to measure anything. But that's not the worst problem. The best analogy I can think of is that the Dhrystone approach is like measuring some properties of a heap of sawdust, then declaring you know how strong the solid wood was before it was reduced to dust. The way those simple program elements interact with each other in a meaningful program matters; you don't learn enough by measuring them in isolation. You need to measure a real program that does real work of the kind you're interested in measuring the performance of. Nothing less will do.

Speaking of, here are Geekbench scores for a M4 Max, both native and Rosetta.


While it looks pretty brutal at first glance, looking through the individual tests shows some are quite close. The ones that aren't close are all those which GB6 docs mention as using advanced SIMD features like M4's SME. We can disregard these because Rosetta doesn't attempt to provide full x86 SIMD features or performance.

If we want to focus in on something that tests what Rosetta is supposed to be good at, a C compiler is a great test of general integer performance. Fortuitously, one of GB6's subtests is the Clang C compiler. The ratio between native and emulated Clang scores is 1.4:1 (native faster), which is a pretty good ratio for Rosetta if you ask me. (Note that I measured essentially 1:1 (perfection) with the counter increment benchmark. This is a perfect illustration of why this is a poor benchmark of Rosetta; it doesn't predict the performance penalty in real world programs at all.)
 
  • Like
Reactions: Basic75 and AshesXY
By the way, this does mean Rosetta 2 leaves some potential performance on the table. An optimization pass that was permitted to blur the boundaries between translated x86 instructions would improve performance, but Apple chose to prioritize observability and formal correctness over best possible performance.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.