Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
And, frequently RAM rather than M2 versus M3 versus M4 may be limiting but we really lack good real-world data. Very few folks will be able to compare an M4 Max/128 GB against an M3 Ultra/500 GB with identical LLM tasks.

Why will few be able to do this. Because so few people will buy both of them, due to the cost?
 
Yes. It can be difficult to conceptualize what exactly the user is getting with each step-up in cores and RAM in Apple's M SOC. The benchmarks help, but, it just seems easier to grasp the discrete GPU gains in PC-land versus the M-class chips.

Thanks for the reference to the RTX 3060 estimation, that's a good approximation. Was interested in that, and also running local LLMs.

For reference, running a base model M2 Max Studio w/ 32GB RAM, I downloaded DSeek R1, 70b at 43GB. My first question in the terminal was to calculate the time to travel, in years, from Earth to Jupiter, assuming a velocity of 60 miles per hour. After a half hour, I had -- literally -- two words of an answer. Two words. So plainly the M2 Max base utterly and completely choked on 70 billion parameters. Had to control-Z that terminal.

But the M2 Max can locally run R1 32b (20GB) at a pretty good speed. It spits out answers to even complex questions with hardly any slowdown. I am so curious to know how an M4 Max at 128GB RAM will handle some of the bigger models. We'll have to wait for the tests. I hope some people here put their machines through the paces of the bigger LLMs.

I have kept these figures around from a Reddit post that I had seen. Tests were done on a MBP 16 M4 Max with 128GB RAM. I would assume the Mac Studio could be slightly better due to better thermals.

The performance figures below were all with MLX models.

I’m sharing this in case you were wondering what kind of throughput you might expect to get on a machine like this. E.g. if you are considering whether it’s worth buying or not (as for me, I have no regrets, I’m loving this beast). Plugged in, auto power mode on a 16’’ MacBook model (turns out the numbers can be different for the 14’’ one), same single short query, the resulting tok/sec numbers are reported below, as measured by LMStudio:

LLaMA 3.2 3B 4bit – 181
LLaMA 3 8B 8bit – 55
LLaMA 3.3 70B 4bit – 11.8
LLaMA 3.3 70B 8bit – 6.5
Mistral Large 123B 4bit – 6.6
Mistral Nemo 12B 4bit – 63
Mistral Nemo 12B 8bit – 36
Mistral Small 22B 4bit – 34.5
Mistral Small 22B 8bit – 19.6
Qwen2.5 14B 4bit – 50
Qwen2.5 14B 8bit – 29
Qwen2.5 32B 4bit – 24
Qwen2.5 32B 8bit – 13.5
Qwen2.5 72B 4bit – 10.9
Qwen2.5 72B 8bit – 6.2
WizardLM-2 8x22B 4bit – 19.4!!

For comparison, here are some numbers obtained in the same setting on my other MacBook, M1 Pro with 32 GB:

Mistral Nemo 12B 4bit – 22.8
Mistral Small 22B 4bit – 12.9
Qwen2.5 32B 4bit – 8.8

Hope it’s interesting / useful.


Upd. Disclaimer! As pointed out by the community, I was using relatively short context. Here is how the numbers change for the two largest models, for your reference:

I took an academic paper (the Min-P paper, in case you are curious) as an example and asked Mistral Large 2407 MLX 4bit to summarize it. I set the context to 10K. The paper + task was 9391 tokens. Time to first token was 206 seconds, throughput 6.18 tok/sec (a drop from 6.6 on a short context).

I did the same with WizardLM-2 8x22B MLX 4bit. The paper + task was 9390 tokens. Time to first token was 207 seconds, throughput 16.53 tok/sec (a drop from 19.4 on a short context).

So the main concern is TTFT (a few minutes on larger contexts, while for the shorter ones above it was always under 7 seconds). However, the throughput doesn’t degrade too badly, as you can see. Please bear this in mind. Thank you for your insightful comments.
 
  • Like
Reactions: rb2112 and JSRinUK
I have kept these figures around from a Reddit post that I had seen. Tests were done on a MBP 16 M4 Max with 128GB RAM. I would assume the Mac Studio could be slightly better due to better thermals.

The performance figures below were all with MLX models.

Thank you, this is like the best report I've seen thus far. Glad you saved it; it's super useful to anyone considering the M4 Max.
 
Why will few be able to do this. Because so few people will buy both of them, due to the cost?
Yes cost is relevant. But even ignoring cost issues few folks would buy one of each different Studio; not zero, but few. Because each user's needs are different. Some would buy two M4 Max/128, some would buy an M3 Ultra with 256 GB RAM, some would buy one M3 Ultra with 500 GB RAM, some would buy two M3 Ultras each with 256 GB RAM, etc.

And folks in that buying cohort are also no doubt wondering what will be going on with the Mac Pro. Some will hold up their buying plans (e.g. one new M4 Max/128 box now instead of M4/128 + M3 Ultra/500) until the next MP is announced.
 
For reference, running a base model M2 Max Studio w/ 32GB RAM, I downloaded DSeek R1, 70b at 43GB. ... So plainly the M2 Max base utterly and completely choked on 70 billion parameters. Had to control-Z that terminal.
Swap will kill any LLM because the model works for inferencing by a process going through the entire model (more or less). Even with the fastest internal storage will kill the LLM.

On your original question, from a video I found on YouTube yesterday, a graph was shown that compare the M4 Max 128GB performance on LLMs, with the M2 Ultra, a dual RTX 4090, and an A6000 I believe. The Macs were the only ones that could run the bigger models, but it lagged the Nvidia cards for the small models that could fit in their limited memory.

Models larger than 70b or so are likely not going to be usefully fast. They will run, but you'll get a couple of tokens per second. So if you just want to start a run and go have lunch, then maybe you'll use the big models, but otherwise probably not.

And that is why I'll likely skip the M3 Ultra because the extra price for 256GB or 512GB still won't deliver useful speed for LLMs.

For diffusion models, however, the large number of GPU cores will likely still be useful. However, the M4 Max with 40 GPU cores is still going to be as fast as the 60 core M3 Ultra. You'd have to shell out the extra $$$ for the 80 core version.

Like others here, I'm most likely to settle for the 128GB M4 Max.

PS: Here's the video I referenced:
 
Last edited:
I’ve just received an update saying my Studio has been dispatched, and I have a UPS tracking link. The link shows it being in Shenzhen, China on the 10th, and it’s on the way (Chek Lap Kok, Hong Kong) as of a moment ago. The estimated delivery is shown as Thursday, 13th (by the end of the day).

Models larger than 70b or so are likely not going to be usefully fast. They will run, but you'll get a couple of tokens per second. So if you just want to start a run and go have lunch, then maybe you'll use the big models, but otherwise probably not.

And that is why I'll likely skip the M3 Ultra because the extra price for 256GB or 512GB still won't deliver useful speed for LLMs.
That was one of my deciding factors against going for the M3U with 256GB (okay, and the massive price! 🤣 ). I’ve been watching Alex Ziskind on YouTube and he’s done comparisons - including with 128GB M4 Max on a MBP. While you can load up 70B models, they’re not always going to be all that quick. Tolerable, acceptable even, but if you want speed, probably not so much.

Seems to me there’s a law of diminishing returns with Macs. More RAM will enable you to load larger models, but speed will suffer. I think it’ll probably do fine with some 34B models I’ve downloaded.

I went with the 128GB model because:

(i) I will occasionally run 70B models (just because I can);
(ii) I’ll run smaller models, alongside Stable Diffusion for image generation, and sometimes also have Parallels running Windows 11 for small projects. This, combined, is all too much for my humble 24GB MBP;
(iii) I want to run LLMs with larger context lengths (I’d like it to summarise full scenes/chapters of the story I’m writing without me having to break them into smaller “acts”);
(iv) I’ve been playing about with ComfyUI for video generation with WAN and, on my 24GB MBP, it sucks up swap space like it’s going out of fashion - like it’s trying to use 60GB (RAM + 40GB SWAP) or something. It does work on the MBP, but it kind of takes over a full day for a 5s clip. I’d like to continue experimenting with things like this without tying up my entire machine for a day each time.

I think 128GB RAM will help towards all of the above.
 
  • Like
Reactions: Velin and picpicmac
I’ve just received an update saying my Studio has been dispatched, and I have a UPS tracking link. The link shows it being in Shenzhen, China on the 10th, and it’s on the way (Chek Lap Kok, Hong Kong) as of a moment ago. The estimated delivery is shown as Thursday, 13th (by the end of the day).


That was one of my deciding factors against going for the M3U with 256GB (okay, and the massive price! 🤣 ). I’ve been watching Alex Ziskind on YouTube and he’s done comparisons - including with 128GB M4 Max on a MBP. While you can load up 70B models, they’re not always going to be all that quick. Tolerable, acceptable even, but if you want speed, probably not so much.

Seems to me there’s a law of diminishing returns with Macs. More RAM will enable you to load larger models, but speed will suffer. I think it’ll probably do fine with some 34B models I’ve downloaded.

I went with the 128GB model because:

(i) I will occasionally run 70B models (just because I can);
(ii) I’ll run smaller models, alongside Stable Diffusion for image generation, and sometimes also have Parallels running Windows 11 for small projects. This, combined, is all too much for my humble 24GB MBP;
(iii) I want to run LLMs with larger context lengths (I’d like it to summarise full scenes/chapters of the story I’m writing without me having to break them into smaller “acts”);
(iv) I’ve been playing about with ComfyUI for video generation with WAN and, on my 24GB MBP, it sucks up swap space like it’s going out of fashion - like it’s trying to use 60GB (RAM + 40GB SWAP) or something. It does work on the MBP, but it kind of takes over a full day for a 5s clip. I’d like to continue experimenting with things like this without tying up my entire machine for a day each time.

I think 128GB RAM will help towards all of the above.
yes me to.... Screenshot 2025-03-11 at 11.56.58.png
 
  • Like
Reactions: iSuzan and JSRinUK
I'm wondering with all this tariff nonsense, could it cost you more to pre-order and have it send direct to you from China instead of buying it locally in the US once it's stocked locally. Do you have to pay a custom fee now?
 
I'm wondering with all this tariff nonsense, could it cost you more to pre-order and have it send direct to you from China instead of buying it locally in the US once it's stocked locally. Do you have to pay a custom fee now?
No. There’s always been a tariff. It’s just more now. Apple prices the tariff (also VAT where that’s a thing) into the cost of the machine. If the new tariff becomes more than Apple can chew it will raise the MAP.
 
Changed again ...

On the Way
Departed from Facility
Dubai Airport, United Arab Emirates

I’ve been trying to figure out what monitor to buy to go with it. I’ve been using my old 1080p monitors with my MBP and, well, they’re not good (thankfully the MBP has its own screen so the monitors are just for putting things on I just glance at). If they ever update it, I’m sorely tempted by an ASD - but it’s well out of my budget right now (that all went on the Studio).

And there seems to be a lot of differing opinions as to what makes a good monitor, how many aren’t great “because they’re 4K, not 5K”, or they flicker, or some scaling issues with Mac. It’s all a bit of a nightmare.

So I took a plunge and bought the BenQ PD2705U 27” 4K Monitor for Mac from Amazon. It’s just arrived and it is, of course, so much better than my 1080p monitors. I’m a simple guy, and I don’t know what all the other issues are with monitors and Macs, but this one seems to work. Has a handy-built in KVM so I can use it with the Studio and my MBP without too many unnecessary cables and boxes lying around.
 
I’ve been watching Alex Ziskind on YouTube and he’s done comparisons - including with 128GB M4 Max on a MBP. While you can load up 70B models, they’re not always going to be all that quick. Tolerable, acceptable even, but if you want speed, probably not so much.
Ziskind is one of the more prolific "content creators" for LLMs on Mac, but even he's in the hawking-brands business.

Regardless, I've concluded that what I wish to do locally still is not possible. At least without buying an 8-bay H200 rack, which besides the cost (around $333k) is noisy, can heat a whole house, and cost $$ in monthly electricity.

Yet I will still be able to make good use of the 32b models, I think.

And I also do diffusion models, or try to on my old Intel iMac, and while the 80-core M3 Ultra is ideal the 40 core M4 Max will likely work well.

Also for AI and audio, which fortunately is not as demanding as video.
 
  • Like
Reactions: JSRinUK
Apparently mine is now :

On the Way
Arrived at Facility
Koeln, Germany

This’ll be one of those scenarios where it travels around the world in a few hours, and then gets stuck at the last stop for a week. 🤣
 
Apparently mine is now :

On the Way
Arrived at Facility
Koeln, Germany

This’ll be one of those scenarios where it travels around the world in a few hours, and then gets stuck at the last stop for a week. 🤣

Do you mean Köln? I haven't seen written with an 'e' before!


Random ask here, I'm considering buying the m4 max though unsure of what custom specs yet. Anyone moving on from an m2 max and planning to sell it or trade it?
 
Random ask here, I'm considering buying the m4 max though unsure of what custom specs yet. Anyone moving on from an m2 max and planning to sell it or trade it?

Bunch of us are contemplating it. Still looking for solid benchmarks. 16/40 is a no-brainer; 128GB RAM is pricey. But the 12/40 128GB M4 Max looks a machine that could last anyone a good long time.
 
Bunch of us are contemplating it. Still looking for solid benchmarks. 16/40 is a no-brainer; 128GB RAM is pricey. But the 12/40 128GB M4 Max looks a machine that could last anyone a good long time.

My issue is compounded... I don't "need" a new m4 studio. I currently have three rigs set up... one a Win10Pro media server for my home, a 'mega rig', which is a custom built Ryzen 5950x, 64GB ram, 4090 Founders, and an aging 2017 fully specced iMac.

Truly I could get by with a mac mini m4 pro with a good hard drive and be fine. I just like toys and always wanted a studio. I do get a partner discount from apple and would have to custom built the m4 to at least 1TB of nvme, though the 'apple tax' kills me when I pay for drives like the samsung 990 Pro 4TB for $300, yet apple has the audacity to charge MORE for a worse/slower drive! If the m4 studio started with 1TB for the $1999 base price, I probably would have already ordered it.
 
I copy & pasted from the website. If it’s spelled wrong, that’s on UPS. :)
The umlaut (the two dots) above a vowel indicates adding an "e" sound after it. But since they are not standard ASCII keyboard characters, it is understood and accepted to type "oe" in place whenever typing ö is not possible. So Köln or Koeln are both correct ways to type it

PS: while "Koln" is, well, wrong. But you can consider it like an alternative spelling in English, if you don't go as far as "Cologne".
 
My issue is compounded... I don't "need" a new m4 studio. I currently have three rigs set up... one a Win10Pro media server for my home, a 'mega rig', which is a custom built Ryzen 5950x, 64GB ram, 4090 Founders, and an aging 2017 fully specced iMac.

Truly I could get by with a mac mini m4 pro with a good hard drive and be fine. I just like toys and always wanted a studio. I do get a partner discount from apple and would have to custom built the m4 to at least 1TB of nvme, though the 'apple tax' kills me when I pay for drives like the samsung 990 Pro 4TB for $300, yet apple has the audacity to charge MORE for a worse/slower drive! If the m4 studio started with 1TB for the $1999 base price, I probably would have already ordered it.

3 weeks ago i ordered a M4 Pro Mac Mini with 1tb and 64gb for 3500 CAD…thankfully i cancelled my order before it shipped after discussing it on this forum and realizing the soon coming mac studio would be a better value. And im glad i did, i paid 4050 for the m4 max 16/40 cores 1tb and 64gb and it is a better value especially considering it has a much better chip 10gb Ethernet and much more ports (even though i dont really need those for now)

Not an expert but I think if u get a base mac mini and maybe upgrade the storage urself, its a good value but as soon as you choose the m4 pro and start specing it up, its start getting very close to m4 max mac studio
 
The umlaut (the two dots) above a vowel indicates adding an "e" sound after it. But since they are not standard ASCII keyboard characters, it is understood and accepted to type "oe" in place whenever typing ö is not possible. So Köln or Koeln are both correct ways to type it

PS: while "Koln" is, well, wrong. But you can consider it like an alternative spelling in English, if you don't go as far as "Cologne".
However you spell it, my Studio is now in the UK and the delivery has changed to today - where it previously said tomorrow. I’ll be hoping for an early finish from work today. :)
 
Mine is out for delivery too!

Didn't even know you could get delivered this quick, China to everywhere in less than 48 hours still sounds wild to me
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.