Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Usually for any very high end configuration you either know why you need it or you don't really need it.

Threadrippers, Mx Ultras, whatever Intel calls their large core count CPUs... they're all the same in this respect.
 
Question from one who doesn't know: How do I know if my specific usage would significantly benefit from the more extreme multi-core CPU scores of a M3 ultra if single CPU scores are more or less the same between a m4 max and m3 ultra? (or close enough so as to not make much difference) My thing is primarily audio DSP, looooots and lots of simultaneous real-time audio DSP processes with very low latency and high sample rates (buffer sizes of 64 samples and lower @ 96kHz sample rates, usually 24bit resolution, but sometimes 32bit resolution). Basically music production, but with an emphasis on complex realtime generative processes running a lot of signals in both series and parallel. I'm very into the programming the front end, but I'm no expert on the back end. Current software setup is a combo of Reason DAW with VST2/VST3 plugins and native rack extensions, coupled with MaxMSP with custom DSP processes written by me, and maybe 10% other weird stuff, and piping audio signals between them all.
 
  • Like
Reactions: picpicmac
It will be very interesting to see the performance comparison of 80c GPU M3U vs 40c GPU M4M. I'm guessing the difference won't be 2x, but more like 1.5x.

In other words, so far M3U looks like it was made mainly for LLMs. That's why it was released, that's why they didn't wait for M4U. Hence the unexpected 512GB RAM option. In any other case, people will probably find better performance at a better price from the maxed out M4Max variant.
 
  • Like
Reactions: amartinez1660
That is to be expected, as GB6 is not a good benchmark for many-core CPUs. Conversely, if you are not using workstation-like workloads that benefit from this many CPU (or GPU) cores, an Ultra would be pointless for you.

I do expect the M3 Ultra to hold its own agains other workstation machine such as Xeons and Threadrippers.
 
Not unexpected, if they do make an M5 Ultra chip though in the Mac Pro, then it could be utterly ridiculous in its performance. I expect GPU benchmarks will show the M3 Ultra to be significantly more powerful given its double core count.
Also really you should be using .5X as even to me 1.5X reads as 1 and a half times faster.
 
I've been waiting a few years for the mac studio update, I'm extremely disappointed.

As it stands, for most tasks, the M4 max will be faster, even the mac book air...

That leaves the very specialized tasks that will be able to take advantage over the M4 max (but we're not talking about photography, or even the majority of audio production needs, ...).

The big problem is that the single core is the one with the most pressure, and so you may find yourself in a situation where the bottleneck is the main core - a classic in video games, for example.

I'm really skeptical about all these responses mentioning AI, it's a collective hallucination.

The unified Ram makes it possible to run large models, but the power of the GPU (even for ultra) is far inferior to Nvidia. In terms of pure power, the M3 Ultra should be a little above a 5070ti. When it comes to AI, however, there's a world of difference (keep in mind that the M3 has slower ia cores):

Capture d’écran 2025-03-07 à 09.38.02.png


So if you are serious in the IA world, you want to train models, the max ultra is not the right choice... unless the only thing you're doing is executing a gigantic model for fun, just long enough to make a tweet (because in my opinion the number of tokens per second is going to be too low to be usable). Note that the QwQ-32B model has just been released, with performance equivalent to the DeepSeek-R1 671B, so the “I need 512gb of ram to run DeepSeek...” routine is a thing of the past. I'm taking bets that the future lies in smaller models.

The big interest for ULTRA that I see is:
- 3D render, because when you render with the GPU (which is very fast), you're limited by VRAM, and unified RAM makes all the difference.
- If you need more than 128gb of ram (for big orchestral audio template)
- For video editing (with the dual chip), etc.

And all this is going to be handicapped because we're starting with the M3, the M4 having received numerous improvements.

Do you have any other use cases? I'm curious to learn more. And do you think that without the ram limit, the M4 max would be inferior?
 
Yeah, this is why I ended up selling the M1 Ultra and then the M2 Ultra, and I won't be buying any more Ultras. The next gen Max chip kept coming out soon after. So really this comes down to if you need more memory, higher memory bandwidth or more GPU cores. Software engineers should just stick with the M4 Max unless they have a specific use case for the extra AI prowess.
We can still hope that m4 Ultra will come ith MacPro. There can be apps that benefit from multiple cores more. But we will see soon in real life tests.
 


The first alleged benchmark result for Apple's new M3 Ultra chip has surfaced in the Geekbench 6 database tonight, allowing for more performance comparisons. The high-end chip is available in the new Mac Studio, introduced earlier this week.

M4-Max-and-M3-Ultra.jpg

Apple said the M3 Ultra chip is the "highest-performing chip it has ever created," and the unverified benchmark result seems to confirm that. In the single result, the 32-core M3 Ultra chip achieved a multi-core CPU score of 27,749, which makes it around 8% faster than the 16-core M4 Max chip that previously held the performance record. The result also reveals that the M3 Ultra chip is up to 30% faster than the 24-core M2 Ultra chip.

As expected, the M4 Max chip tops the M3 Ultra chip in terms of single-core CPU performance by nearly 20%, according to the result.

We now await additional M3 Ultra benchmark results to see if these scores are accurate, as they seem to be on the lower side compared to what was expected. For example, Apple advertised the M3 Ultra chip as being up to 1.5x faster than the M2 Ultra chip, so that 30% increase mentioned above should seemingly be closer to the 50% mark. Apple never said how the M3 Ultra chip's performance compares to the M4 Max chip, though.

As always, real-world performance may vary somewhat, but synthetic benchmark tools like Geekbench 6 provide a useful baseline for comparisons.

Watch this space, as we would not be surprised if additional Geekbench 6 results for the M3 Ultra chip end up having higher performance scores.

The benchmark was spotted by @jimmyjames_tech and shared by Vadim Yuryev.

Article Link: M3 Ultra Chip Isn't Much Faster Than M4 Max in First Benchmark Result
Anyway ultra should have double the encoders for video so it should divide the render times by 50%?
 
Geekbench seems to have trouble with many-core CPUs.

It's a deliberately design decision in 6.x compared to Geekbench 5. They want to more accurately portray how the CPU will behave in real-world scenarios.

In this case, though, people should only buy the M3 Ultra — or the Threadripper, for that matter — if they really have heavily parallelized workloads.
 
  • Like
Reactions: atonaldenim
Wow, seems like m3 ultra is a controversial upgrade. Only 8% faster in multicore performance compared to m4 max and almost 20% SLOWER than m4 max in singlecore performance. So, practically The 2000usd extra for the "fastest mac ever" goes for the gpu performance... and the memory potential (which should be paid upfront, selling your kidney first, due to the all-in-one, anti-professional architecture and logic).

Pretty lame Apple.
 
We can still hope that m4 Ultra will come ith MacPro. There can be apps that benefit from multiple cores more. But we will see soon in real life tests.

Apple replied in the French media that the M4 didn't have the fusion die to make it an Ultra, and that not every version was destined to have an Ultra version. Since they managed to introduce Thunderbolt 5 with the M3, they released the M3 Ultra.

I find the Mac Pro useless (insofar as everything is soldered and Thunderbolt 5 has a monstrous speed). If everything has to be soldered, the Mac Studio is perfect.

It remains to be seen whether macstudio will have the M5 Ultra in a year's time, or whether we'll have to wait 2 years for the M6 Ultra.
 
I've been waiting a few years for the mac studio update, I'm extremely disappointed.

As it stands, for most tasks, the M4 max will be faster, even the mac book air...

That leaves the very specialized tasks that will be able to take advantage over the M4 max (but we're not talking about photography, or even the majority of audio production needs, ...).

The big problem is that the single core is the one with the most pressure, and so you may find yourself in a situation where the bottleneck is the main core - a classic in video games, for example.

I'm really skeptical about all these responses mentioning AI, it's a collective hallucination.

The unified Ram makes it possible to run large models, but the power of the GPU (even for ultra) is far inferior to Nvidia. In terms of pure power, the M3 Ultra should be a little above a 5070ti. When it comes to AI, however, there's a world of difference (keep in mind that the M3 has slower ia cores):

View attachment 2489291

So if you are serious in the IA world, you want to train models, the max ultra is not the right choice... unless the only thing you're doing is executing a gigantic model for fun, just long enough to make a tweet (because in my opinion the number of tokens per second is going to be too low to be usable). Note that the QwQ-32B model has just been released, with performance equivalent to the DeepSeek-R1 671B, so the “I need 512gb of ram to run DeepSeek...” routine is a thing of the past. I'm taking bets that the future lies in smaller models.

The big interest for ULTRA that I see is:
- 3D render, because when you render with the GPU (which is very fast), you're limited by VRAM, and unified RAM makes all the difference.
- If you need more than 128gb of ram (for big orchestral audio template)
- For video editing (with the dual chip), etc.

And all this is going to be handicapped because we're starting with the M3, the M4 having received numerous improvements.

Do you have any other use cases? I'm curious to learn more. And do you think that without the ram limit, the M4 max would be inferior?
Unfortunately, what you write about AI/LLM are assumptions and reading headlines and have nothing to do with reality, this is not the right information (anyone reading this should ignore it).
QwQ-32B is not revolutionary in any way, it is an evolution that behaves better than models with the same number of parameters in some synthetic benchmarks and in some applications (like coding) approaches non-distilled models but it is not an equivalent model to non distilled ones, not even close. In the long context it gets lost just like any other 32B model.
So if you are serious in the IA world, you want to train models, the max ultra is not the right choice... unless the only thing you're doing is executing a gigantic model for fun, just long enough to make a tweet (because in my opinion the number of tokens per second is going to be too low to be usable).
Once again, this is just plain wrong. Yes, M3U RAM is not as fast as NVIDIA's, refresh rate is quite low, but it's size is still extremely unique and will have it's place. Expected speed for a full node is 40toks/sec. which is plenty.

You can read more about it with more technical details here:
 
There was hope that with the PSU TDP increasing from 370W to 480W, perhaps the Studio could overclock the M3 Ultra a bit to give it some advantage than the M3 Max inside a 16". But the 4.05GHz in the above GB6 benchmark record means there is no clocking difference at all.

And then the roughly 1.4x multiplier of M3 Ultra's multi-core score over the M3 Max means even scaling efficiency remains the same 70% as with the M2 Ultra gen.

So yes everything is within perfect prediction. Which is the disappointing part. But at least there is no regress, so.

I just rechecked the numbers and on multicore CPU, and you are right on. The 24 core M2 Ultra was 1.432 times faster than the 12 core M2 Max. The 76 GPU core M2 Ultra was 1.528 times faster than the 38 GPU core M2 Max.

That means for the 28 Core M3 Ultra, the GB6 multicore score could be around 27,136, while the 14 core M4 Max multicore score is 23,051. The article above showed an even worse score of 27,749 for the 32 Core M3 Ultra (compared to my estimate below of 29,998).

Using these efficiency numbers I am going to use the Geekbench Mac benchmark chart to roughly estimate what the numbers might look like. Individual scores may be faster but this Mac benchmark chart might be an average of all scores.


Code:
Model            | SingleCore | MultiCore | Metal
---------------------------------------------------
M2 Max 12/38       2,804        14,876      145,677
M2 Ultra 24/76     2,777        21,371      222,582
M3 Max 14/30       3,130        18,951      125,498
M3 Max 16/40       3,131        20,949      155,679
M3 Ultra 28/60     3,131        27,136      203,345
M3 Ultra 32/80     3,131        29,998      237,877
M4 Pro 12/16       3,829        20,192       97,281
M4 Pro 14/20       3,826        22,337      110,100
M4 Max 14/32       3,925        23,051      159,070
M4 Max 16/40       3,921        25,647      187,460


The one thing that stands out in these benchmarks is how the GPU power has not really increased all that much generation over generation when comparing M2 and M3.

The other thing that stands out in regards to the new Mac Studio, is how close the M4 Max 16/40 might be to the M3 Ultra 28/60.

Apple upgraded the M3 Ultra to Thunderbolt 5, so maybe they have some more tricks up their sleeve and their efficiency when going from Max to Ultra will be improved as well. We will have to wait and see...
 
Last edited:
  • Love
Reactions: amartinez1660
I just rechecked the numbers and on multicore CPU, and you are right on. The 24 core M2 Ultra was 1.432 times faster than the 12 core M2 Max. The 76 GPU core M2 Ultra was 1.528 times faster than the 38 GPU core M2 Max.

That means for the 28 Core M3 Ultra, the GB6 multicore score could be around 27,136, while the 14 core M4 Max multicore score is 23,051. The article above showed an even worse score of 27,749 for the 32 Core M3 Ultra (compared to my estimate below of 29,998).

Using these efficiency numbers I am going to use the Geekbench Mac benchmark chart to roughly estimate what the numbers might look like. Individual scores may be faster but this Mac benchmark chart might be an average of all scores.


Code:
Model            | SingleCore | MultiCore | Metal
---------------------------------------------------
M2 Max 12/38       2,804        14,876      145,677
M2 Ultra 24/76     2,777        21,371      222,582
M3 Max 14/30       3,130        18,951      125,498
M3 Max 16/40       3,131        20,949      155,679
M3 Ultra 28/60     3,131        27,136      191,760
M3 Ultra 32/80     3,131        29,998      237,877
M4 Pro 12/16       3,829        20,192       97,281
M4 Max 14/32       3,925        23,051      159,070
M4 Max 16/40       3,921        25,647      187,460


The one thing that stands out in these benchmarks is how the GPU power has not really increased all that much generation over generation.

The other thing that stands out in regards to the new Mac Studio, is how close the M4 Max 16/40 might be to the M3 Ultra 28/60.

Apple upgraded the M3 Ultra to Thunderbolt 5, so maybe they have some more tricks up their sleeve and their efficiency when going from Max to Ultra will be improved as well. We will have to wait and see...
The Geekbench browser uses the Mac identifier to separate scores from each Mac. But with M1 and M2 gen, Macs with binned chip share the same identifier with the full chip, in GB this can only be found out by going into each score result page and look at the core count etc. But the score out front is an average of all entries, so yes they mix all up regardless binned and full chip, and also RAM amount while we are at it. Luckily with M1 and M2 gen, the CPU cores are never disabled on binned chips, so it is just the GPU scores that are averaged between bin and full.

But with M3 Pro M3 Max, somehow Apple started to use different identifiers for binned SKUs, in this case the GB front lists them as separate entries. Also you could search by the identifier to filter out pages of results just from the binned.

The Apple Silicon GPU core has not yet received generational upgrade in terms of architecture or design. They have instead added on RT cores, or change the caching methods to improve bandwidth efficiency, but these don't move the needle much. The improvement in GPU performance is mostly from decreasing node size to get more cores on die surface.
 
  • Like
Reactions: atonaldenim
QwQ-32B is not revolutionary in any way, it is an evolution that behaves better than models with the same number of parameters in some synthetic benchmarks and in some applications (like coding)

You're right, I compared a specialized model (since this is often the area highlighted for justifying the need of running a model locally for privacy) with a generalized frontier, which is unfair.

Once again, this is just plain wrong. Yes, M3U RAM is not as fast as NVIDIA's, refresh rate is quite low, but it's size is still extremely unique and will have it's place. Expected speed for a full node is 40toks/sec. which is plenty.

You're right again, I read the post and indeed, for running huge sparse models (which is a niche) like DeepSeek (MoE or Modular Routing), because only a fraction of the parameters is activated, it reduces the demand on memory refresh rates and raw compute. The M3 Ultra’s large unified memory enables it to load massive models at a lower cost.

However, for running dense or hybrid models and most importantly for training (the most important thing for an AI developer) its performance falls short. In these cases, the lower memory refresh rate and bandwidth compared to dedicated GPUs make it less suitable.


My point was that presenting the Mac Ultra as the ultimate AI machine is misleading. Except for a small niche, it is not well-suited for this purpose compared to other use cases.
 
  • Like
  • Love
Reactions: jouster and pksv
Maybe it had been binned and accidentally removed and put in the machine.

Apple seems to like using binned chips so please all semiconductor fab sites; please put locks on your waste bins as Apple is taking those chips and putting them in devices.
 
Yeah, it's important to see full benchmarks before pulling the trigger on this. So far an 8% improvement over the Max isnt' too impressive unless it's the memory speed you value.
I honestly think this was a bit of a late scramble to put something together for the LLM space, which has become popular, with Nvidia releasing Project Digits. Apple probably had this relatively underwhelming M3 Ultra on the back burner but knew it was something that could be configured with massive amounts of RAM.


This is going to be one of the only systems (if not the only) with effectively over 500GB of VRAM to run an LLM. Project Digits is $3000 with only 192GB.
 
  • Like
Reactions: pksv and rp2011
the odd thing is that it's nowhere near twice as fast as an m3 max. Any explanation for that?
 
Last edited:
I just rechecked the numbers and on multicore CPU, and you are right on. The 24 core M2 Ultra was 1.432 times faster than the 12 core M2 Max. The 76 GPU core M2 Ultra was 1.528 times faster than the 38 GPU core M2 Max.

That means for the 28 Core M3 Ultra, the GB6 multicore score could be around 27,136, while the 14 core M4 Max multicore score is 23,051. The article above showed an even worse score of 27,749 for the 32 Core M3 Ultra (compared to my estimate below of 29,998).

Using these efficiency numbers I am going to use the Geekbench Mac benchmark chart to roughly estimate what the numbers might look like. Individual scores may be faster but this Mac benchmark chart might be an average of all scores.


Code:
Model            | SingleCore | MultiCore | Metal
---------------------------------------------------
M2 Max 12/38       2,804        14,876      145,677
M2 Ultra 24/76     2,777        21,371      222,582
M3 Max 14/30       3,130        18,951      125,498
M3 Max 16/40       3,131        20,949      155,679
M3 Ultra 28/60     3,131        27,136      191,760
M3 Ultra 32/80     3,131        29,998      237,877
M4 Pro 12/16       3,829        20,192       97,281
M4 Max 14/32       3,925        23,051      159,070
M4 Max 16/40       3,921        25,647      187,460


The one thing that stands out in these benchmarks is how the GPU power has not really increased all that much generation over generation when comparing M2 and M3.

The other thing that stands out in regards to the new Mac Studio, is how close the M4 Max 16/40 might be to the M3 Ultra 28/60.

Apple upgraded the M3 Ultra to Thunderbolt 5, so maybe they have some more tricks up their sleeve and their efficiency when going from Max to Ultra will be improved as well. We will have to wait and see...
There's still a chance that the initial results are fake. We know the M3 process was costly and not on Thunderbolt 5, so they might have used a new process that could improve performance (and while it seems unlikely, they could also boost the frequencies on a Mac Studio) to compensate and narrow the gap with the M4.

In any case, if the information is accurate, the fully equipped M4 Max offers the best value for money for most use cases.
 
  • Like
Reactions: ThomasPicard
Something about Geekbench seems off.. we have lower gpu scores than on the m2ultra… maybe they have to update it?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.