Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Thanks for your informative post. I am also looking at running models around the same size, since as they get larger the performance deteriorates too quickly. Can you give an estimate of how many tokens/sec you were getting out of your M4 Max running the same size model?

I have been on the fence of getting a base Ultra, or the same spec M4 Max as you (128GB/1TB SSD). When comparing a M3 Max and M4 Pro with way less GPU cores, the M4 Pro was keeping up pretty good on MLX, and I was wondering if it could be the Arm V9/SME difference.

I'm actually away right now, and when I get back I will be pulling the trigger on a Mac Studio - but I can't get over the fact that the M4 Max in that configuration is so close to the M3 Ultra. Any other year, I would go for the Ultra - but this year its quite the dilemma.

If it helps with my M3 Ultra I get 24.34 tok/sec with qwen2.5-32b-instruct Q5_K_M GGUF.

MLX would be faster but I find it less accurate.

If there is a particular model you want tested let me know.
 
The M3 Ultra also ran much cooler under load.
The full copper heatsink surely helps....One of my reasons not to drop my m2u for the m4m also.
I am practically at at 50-75% load all of the time when. I am working, Temps are in the 60-62 Celsius range. I know about 65 the fans will ramp up, and at more fan speed (around 2500 rpm) it becomes noticeable audibly.
 
  • Like
Reactions: Smoke Bar
For Ai the M3 Ultra is the way to go, I ordered both the M4 Max and M3 Ultra and sent back the M4 Max.

The M4's better single core performance made no discernible difference in daily use.

The M3 Ultra also ran much cooler under load.
The issue with the M3 Ultra version has always been about investment vs. value - For paying what’s asked for the upgrade, the machine SHOULD have better single-core performance. For that price, M3 Ultra owners shouldn’t be having to justify slower speeds.

The reason why the M4 Max has been deemed by so many to be the better value is that you pay a lot less, get better single-core performance (which is actually noticeable in some programs) and you get multi-core scores that in some activities, are not that far behind the M3 Ultra due to the M4’s efficiency.
 
  • Like
Reactions: rb2112 and herp
Thank you so much for another great post. We use pretty much the same apps (though I lean towards JetBrains for IDE) and I have pretty close to the same use case. You answered a lot of my questions and have pushed me towards the M4 Max now.

I was having a tough time deciding because with the Microcenter M3 Ultra deal for $3399, it is actually $300 cheaper than the M4 Max with 128GB/1TB SSD.

I just refreshed the page before posting and saw your last post about gaming. I ended up doing the same thing and dedicated two of my Xeon workstations into a rack setup. One of them had a GPU for gaming, but I ended up deciding to just go console for the same reasons.

Thanks again. I really appreciate all of you insights.
1742485625750.png


DeepSeek-R1-Distill-Qwen-32B-4bit

Training with lora with small 20 question/validations yields:

Untitled 4.jpg


Mac Studio heats up hot in the top rear half but I can't even hear fans while air does flow out back then completing tasks cools down rapidly. My previous 16 inch MBP fans obviously would go off crazy. Mac Studio is just amazing thermals. I just love this Mac Studio! (memory use around 39GB before finishing)

I'm also testing

Qwen2.5 32B 8-bit
Untitled 0.jpg



and it's taking 90GB of memory - my M4Max Mac Studio will hiccup (momentarily stutter/freeze) for a about 1 second (while loading into memory) - performance is 13tokens/s. nearing my "annoyance" tolerance, lol. Since macOS allocates 75% of memory, it's 96GB for GPU, the rest for the system. The system had to use cache until I ran the iogpu wired limit giving it 104GB ram, the remaining for the system and therefore no swap file. But let me just say at 8Bit, it's doable. but I'd stick with 4-bit for testing, etc. Again, anything greater than 32B I would not bother on a Mac it's way too slow! 32B-4bit is sweet spot.
 
Last edited:
The issue with the M3 Ultra version has always been about investment vs. value - For paying what’s asked for the upgrade, the machine SHOULD have better single-core performance. For that price, M3 Ultra owners shouldn’t be having to justify slower speeds.

The reason why the M4 Max has been deemed by so many to be the better value is that you pay a lot less, get better single-core performance (which is actually noticeable in some programs) and you get multi-core scores that in some activities, are not that far behind the M3 Ultra due to the M4’s efficiency.
It's not all that unusual for massively multicore processors to have slower individual clock speeds. Back during the Mac Pro trashcan era, the more cores you added, the slower each core was (4 cores @ 3.7GHz versus 6 cores @ 3.5GHz versus 12 cores at 2.7GHz, I think it was). Some current Intel Xeon/server-level processors are still the same, I believe.

It's entirely possible that Apple decided that a 32 core M4 Ultra simply produced too much heat and chose not to use it.

If your work truly benefits from 32 cores, then the M3 Ultra will still be faster than a 16 core M4 Max.
 
View attachment 2494172

DeepSeek-R1-Distill-Qwen-32B-4bit

Training with lora with small 20 question/validations yields:

View attachment 2494205

Mac Studio heats up hot in the top rear half but I can't even hear fans while air does flow out back then completing tasks cools down rapidly. My previous 16 inch MBP fans obviously would go off crazy. Mac Studio is just amazing thermals. I just love this Mac Studio! (memory use around 39GB before finishing)

M3U is giving me 37.58 tok/sec for DeepSeek-R1-Distill-Qwen-32B-4bit.

These machines are truly remarkable when it comes to thermal performance. My Studio runs silently 95% of the time, even when under heavy load. I don't miss my old noisy hot 4090 rig at all!
 
Last edited:
It's not all that unusual for massively multicore processors to have slower individual clock speeds. Back during the Mac Pro trashcan era, the more cores you added, the slower each core was (4 cores @ 3.7GHz versus 6 cores @ 3.5GHz versus 12 cores at 2.7GHz, I think it was). Some current Intel Xeon/server-level processors are still the same, I believe.

It's entirely possible that Apple decided that a 32 core M4 Ultra simply produced too much heat and chose not to use it.

If your work truly benefits from 32 cores, then the M3 Ultra will still be faster than a 16 core M4 Max.
Of course the M3 Ultra will still be faster in that work situation - But that’s not the question. The question is if you are getting an equivalent amount of performance in exchange for what you spend.

The pricing of the M3 Ultra is set in a way where you get less single-core performance than a model that is about half the price…and that lower-priced model is also not that far behind in many of the multi-core tasks. Apple is overcharging for the performance that you get with an M3 Ultra.
 
Of course the M3 Ultra will still be faster in that work situation - But that’s not the question. The question is if you are getting an equivalent amount of performance in exchange for what you spend.

The pricing of the M3 Ultra is set in a way where you get less single-core performance than a model that is about half the price…and that lower-priced model is also not that far behind in many of the multi-core tasks. Apple is overcharging for the performance that you get with an M3 Ultra.
Yeah, perhaps. But performance per dollar is not linear. There's always a point of diminishing returns. I guess if the M3 Ultra was $500 cheaper than it is, it would have been a nice consolation to the absence of an M4 Ultra.
 
  • Like
Reactions: picpicmac
No. There’s always been a tariff. It’s just more now. Apple prices the tariff (also VAT where that’s a thing) into the cost of the machine. If the new tariff becomes more than Apple can chew it will raise the MAP.
Apple negotiated with Trump to remove the 20% China tariff, so Apple products don't have that added on. The new 34% add-on is not immune, and Apple needs to account for this when it goes into effect on April 9.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.