Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Well, even ~$200 Intel and AMD CPUs handily outperform the best M2 models at this point, so nothing new under the sun. But as @galad writes: comparing apples with oranges.

 
IMG_0042.png


Turns out when you stop caring about power consumption you can do a lot.

It’s frankly both pathetic and embarrassing to be pulling that much power, and is a sign of how bad Intel’s arch is, but hey, if you want to be 6% faster than AMDs fastest at the cost of an extra 200W, you can do it.
 
Intel i9:
single = 3096
multi = 21734


M3 Max:
single: 2971
multi: 20785

I just did a search for M3 Max on Primate's website. Here's the start of the results--the most I could squeeze into a screenshot on my MBP's screen. Just by eye, the SC scores average >100 points higher than the lowball score Zest28 posted.

Note also that this is a comparison of one of Intel's most power-hungry desktop processors to a laptop processor.
1699070163791.png
 
Last edited:
Well, even ~$200 Intel and AMD CPUs handily outperform the best M2 models at this point, so nothing new under the sun. But as @galad writes: comparing apples with oranges.

The problem with these multi-core scores is that they don't account for multi-core tasks that actually have interdependencies, which is something that was addressed in GB6. For instance, here we also have the old $1,000 64-core AMD EPYC 7702 (released Aug 2019) outperforming the newer $5,000 32-core AMD EPYC 75F3 (released Mar 2021). So if you have a bunch of separate task that you can run on separate cores, and just need to throw a lot of cores at the problem, this benchmark may be useful--but only if the limited types of tasks it incorporates correspond to the types of tasks you are doing yourself.

GB is probably a much better general benchmark, since it correlates well with those from SPEC (Standard Performance Evaluation Corporation), which are an industry standard.

1699140356032.png
 
  • Like
Reactions: souko
The problem with these multi-core scores is that they don't account for multi-core tasks that actually have interdependencies, which is something that was addressed in GB6. For instance, here we also have the old $1,000 64-core AMD EPYC 7702 (released Aug 2019) outperforming the newer $5,000 32-core AMD EPYC 75F3 (released Mar 2021). So if you have a bunch of separate task that you can run on separate cores, and just need to throw a lot of cores at the problem, this benchmark may be useful--but only if the limited types of tasks it incorporates correspond to the types of tasks you are doing yourself.
Geekbench 6 takes it too far in the other direction, because it assumes that the computer is doing only one thing at a time. That may be adequate for consumer devices, but it doesn't really reflect the way higher-end computers are often used.

A better benchmark would run a few copies of the multi-core benchmark in parallel, and perhaps also do the subtasks in different order in each copy. Then it would report the sum of multi-core scores as the true multi-core score.
 
  • Like
Reactions: bobcomer
Geekbench 6 takes it too far in the other direction, because it assumes that the computer is doing only one thing at a time. That may be adequate for consumer devices, but it doesn't really reflect the way higher-end computers are often used.

A better benchmark would run a few copies of the multi-core benchmark in parallel, and perhaps also do the subtasks in different order in each copy. Then it would report the sum of multi-core scores as the true multi-core score.
If we think about it, GB6 is designed for multiple threads to co-operate and finish a common task. If a CPU architecture does well using GB6, does it follow that the same CPU architecture will do just as well completing multiple un-related tasks?

For massively parallel work-load that does not relate to one another, my understanding is that it is a main function of how fast the CPU can get at and process memory, how many useful threads can be in-flight at once, and how good the CPU IPC is. Apple's SoC main advantage is that it has larger pipe to main memory, the higher you go (M, M Pro, M Max, M Ultra), the pipe gets increasingly larger.

The main dis-advantage is that Apple's CPU core runs at much lower frequencies compared to the CPU's that it's being compared against. The positive is that single thread GB6 benchmarks from the M3 show that even running at a 2GHz deficit it trades blows with those it's being compared to.
 
If we think about it, GB6 is designed for multiple threads to co-operate and finish a common task. If a CPU architecture does well using GB6, does it follow that the same CPU architecture will do just as well completing multiple un-related tasks?
If you have a 16-core processor, GB5 runs 16 independent tasks with 1 core each. GB6 does the opposite, running just one task that is allowed to use all 16 cores. I was suggesting something in the middle, such as 4 independent tasks with 4 cores each. The scaling would likely be somewhere between what you see in GB5 and GB6.
 
If you have a 16-core processor, GB5 runs 16 independent tasks with 1 core each. GB6 does the opposite, running just one task that is allowed to use all 16 cores. I was suggesting something in the middle, such as 4 independent tasks with 4 cores each. The scaling would likely be somewhere between what you see in GB5 and GB6.
But my argument then still stands correct? If a CPU architecture does well with completing a shared task, does it follow that it will work just as well with the other extreme and anything that is in between as you have suggested?

Edit: I would add that designing a CPU architecture to complete a shared task well would be the harder than one that does well only completing unrelated tasks.
 
So you are comparing what would be Intel’s A series Ultra chip competitor to what would be their i7 equivalent? That needs a big rig, and lots of power, etc? Ok
 
  • Like
Reactions: Gudi
$700 CPU + $500 GPU + $1,000 other components + gym membership to increase strength to carry around your desktop/monitor combo around.
Not to mention free heating. I am struggling to detect any heat emanating from my M1 MBA on my laptop even after consecutive hours on zoom. Practically freezing. 😕
 
But my argument then still stands correct? If a CPU architecture does well with completing a shared task, does it follow that it will work just as well with the other extreme and anything that is in between as you have suggested?

Edit: I would add that designing a CPU architecture to complete a shared task well would be the harder than one that does well only completing unrelated tasks.
Yes, if device A does well on GB6 MC, it would also do well on GB5 MC. But this doesn't account for comparative assessment.

Let's consider a toy example. Suppose you want to do mostly separate simultaneous SC tasks (no significant interdependencies). And let's suppose you want to know whether device A or B would perform better. Let's make up some arbitrary scores:

GB6 MC:
A = 20,000
B = 15,000

By your argument, you don't need to consider GB5, since you know A will do well there also. But not so fast. Let's suppose we have this on GB5:

GB5 MC:
A = 28,000
B = 38,000

Even though our very good GB6 score for A predicts it will (as you say) do well on GB5, that doesn't tell us whether B might do even better. And in our simple toy example, B would be the better-performing processor for the use case I described.
 
But my argument then still stands correct? If a CPU architecture does well with completing a shared task, does it follow that it will work just as well with the other extreme and anything that is in between as you have suggested?

Edit: I would add that designing a CPU architecture to complete a shared task well would be the harder than one that does well only completing unrelated tasks.
It's usually more about the task itself than the hardware. Most tasks parallelize well up to certain number of cores but get diminishing returns from further parallelization. In many consumer tasks, the threshold can be as low as 2-4 cores. There may be a limited number of independent subtasks that you can run in parallel, or there may be sequential parts that don't benefit from any parallelization. In the kind of bioinformatics stuff I do at work, you can usually parallelize over 16-32 cores before you start running into I/O or synchronization bottlenecks.
 
Geekbench 6 takes it too far in the other direction, because it assumes that the computer is doing only one thing at a time. That may be adequate for consumer devices, but it doesn't really reflect the way higher-end computers are often used.

A better benchmark would run a few copies of the multi-core benchmark in parallel, and perhaps also do the subtasks in different order in each copy. Then it would report the sum of multi-core scores as the true multi-core score.
That's a reasonable point. And it also raises an interesting question: Just how far does GB6 take it, i.e., are all it's MC tasks ones with significant intedependencies, or are there are mix between those an embarrasingly parallel tasks? And of the ones in the former category, how challening are their interdependencies as compared with the most-used MC apps?

Regardless, I'd say whether GB6 takes it too far is very user-dependent. Perhaps it would be better to leave GB6 and GB5 as-is, and let the user decide what percent of their use case is represented by either. The you could create your own custom score by adding x% * GB5 + (100-x)% * GB6.

Of course, the best thing would be to get your hands on the devices you want to compare, and run your own custom benchmarks--ones you've created to reflect your own workflow. [Or create benchmarks you can send to colleagues who have the devices you're considering, and ask them to run them....]
 
Last edited:
But my argument then still stands correct? If a CPU architecture does well with completing a shared task, does it follow that it will work just as well with the other extreme and anything that is in between as you have suggested?

Edit: I would add that designing a CPU architecture to complete a shared task well would be the harder than one that does well only completing unrelated tasks.
Doing a task in parallel corresponds more to efficient and fast memory management, while having an IPC that can take advantage of the apis.

Single-core benchmarks are important for individualized tasks more so than necessarily splitting a single task in parallel. The more equipped and faster it is, the better it will perform at individual tasks.
 
Intel i9:
single = 3096
multi = 21734


M3 Max:
single: 2971
multi: 20785

Why do you have to compare them all the time? It is clear that intel and amd with the most powerful graphics cards will always be better than m-chips with different consoles, no matter how ultra, no matter maxi, no matter hyper.
 
  • Haha
Reactions: Juraj22
Well, I surely hope that Intel's high-end stove-like gaming desktop CPU will outperform Apple's laptop computer. Actually, this is starting to get a bit embarrassing for Intel. 5-10x more energy for the same performance? Yikes.
Custom air and water cooling solves all high temp problems.
Performance in what? Figures and beautiful graphs are parrots, working programs are different and from them you need to dance. Separate the flies from the cutlets.
 
Why do you have to compare them all the time? It is clear that intel and amd with the most powerful graphics cards will always be better than m-chips with different consoles, no matter how ultra, no matter maxi, no matter hyper.

Never say never. Apple is currently a leader in GPU scheduling and hardware utilization, and they move forwards at a tremendous speed. M3 Max should catch up to 7900XT in production raytracing. Nvidia will still be ahead for foreseeable time, but who knows what future innovations will bring.
 
  • Like
Reactions: MRMSFC
Never say never. Apple is currently a leader in GPU scheduling and hardware utilization, and they move forwards at a tremendous speed. M3 Max should catch up to 7900XT in production raytracing. Nvidia will still be ahead for foreseeable time, but who knows what future innovations will bring.
The key word here is "planning", and produces top-end Nvidia graphics cards.
7900xt- the worst graphics card.
In addition, Nvidia has significant aces up its sleeve to surprise customers.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.