M4 Max Studio - preparing to ship, yay!

Velin · Mar 10, 2025

Allen_Wentz said:
And, frequently RAM rather than M2 versus M3 versus M4 may be limiting but we really lack good real-world data. Very few folks will be able to compare an M4 Max/128 GB against an M3 Ultra/500 GB with identical LLM tasks.

Why will few be able to do this. Because so few people will buy both of them, due to the cost?

OldMike · Mar 10, 2025

Velin said:
Yes. It can be difficult to conceptualize what exactly the user is getting with each step-up in cores and RAM in Apple's M SOC. The benchmarks help, but, it just seems easier to grasp the discrete GPU gains in PC-land versus the M-class chips.

Thanks for the reference to the RTX 3060 estimation, that's a good approximation. Was interested in that, and also running local LLMs.

For reference, running a base model M2 Max Studio w/ 32GB RAM, I downloaded DSeek R1, 70b at 43GB. My first question in the terminal was to calculate the time to travel, in years, from Earth to Jupiter, assuming a velocity of 60 miles per hour. After a half hour, I had -- literally -- two words of an answer. Two words. So plainly the M2 Max base utterly and completely choked on 70 billion parameters. Had to control-Z that terminal.

But the M2 Max can locally run R1 32b (20GB) at a pretty good speed. It spits out answers to even complex questions with hardly any slowdown. I am so curious to know how an M4 Max at 128GB RAM will handle some of the bigger models. We'll have to wait for the tests. I hope some people here put their machines through the paces of the bigger LLMs.

I have kept these figures around from a Reddit post that I had seen. Tests were done on a MBP 16 M4 Max with 128GB RAM. I would assume the Mac Studio could be slightly better due to better thermals.

The performance figures below were all with MLX models.

I’m sharing this in case you were wondering what kind of throughput you might expect to get on a machine like this. E.g. if you are considering whether it’s worth buying or not (as for me, I have no regrets, I’m loving this beast). Plugged in, auto power mode on a 16’’ MacBook model (turns out the numbers can be different for the 14’’ one), same single short query, the resulting tok/sec numbers are reported below, as measured by LMStudio:

LLaMA 3.2 3B 4bit – 181
LLaMA 3 8B 8bit – 55
LLaMA 3.3 70B 4bit – 11.8
LLaMA 3.3 70B 8bit – 6.5
Mistral Large 123B 4bit – 6.6
Mistral Nemo 12B 4bit – 63
Mistral Nemo 12B 8bit – 36
Mistral Small 22B 4bit – 34.5
Mistral Small 22B 8bit – 19.6
Qwen2.5 14B 4bit – 50
Qwen2.5 14B 8bit – 29
Qwen2.5 32B 4bit – 24
Qwen2.5 32B 8bit – 13.5
Qwen2.5 72B 4bit – 10.9
Qwen2.5 72B 8bit – 6.2
WizardLM-2 8x22B 4bit – 19.4!!

For comparison, here are some numbers obtained in the same setting on my other MacBook, M1 Pro with 32 GB:

Mistral Nemo 12B 4bit – 22.8
Mistral Small 22B 4bit – 12.9
Qwen2.5 32B 4bit – 8.8

Hope it’s interesting / useful.

Upd. Disclaimer! As pointed out by the community, I was using relatively short context. Here is how the numbers change for the two largest models, for your reference:

I took an academic paper (the Min-P paper, in case you are curious) as an example and asked Mistral Large 2407 MLX 4bit to summarize it. I set the context to 10K. The paper + task was 9391 tokens. Time to first token was 206 seconds, throughput 6.18 tok/sec (a drop from 6.6 on a short context).

I did the same with WizardLM-2 8x22B MLX 4bit. The paper + task was 9390 tokens. Time to first token was 207 seconds, throughput 16.53 tok/sec (a drop from 19.4 on a short context).

So the main concern is TTFT (a few minutes on larger contexts, while for the shorter ones above it was always under 7 seconds). However, the throughput doesn’t degrade too badly, as you can see. Please bear this in mind. Thank you for your insightful comments.

Velin · Mar 10, 2025

OldMike said:
I have kept these figures around from a Reddit post that I had seen. Tests were done on a MBP 16 M4 Max with 128GB RAM. I would assume the Mac Studio could be slightly better due to better thermals.

The performance figures below were all with MLX models.

Thank you, this is like the best report I've seen thus far. Glad you saved it; it's super useful to anyone considering the M4 Max.

Allen_Wentz · Mar 10, 2025

Velin said:
Why will few be able to do this. Because so few people will buy both of them, due to the cost?

Yes cost is relevant. But even ignoring cost issues few folks would buy one of each different Studio; not zero, but few. Because each user's needs are different. Some would buy two M4 Max/128, some would buy an M3 Ultra with 256 GB RAM, some would buy one M3 Ultra with 500 GB RAM, some would buy two M3 Ultras each with 256 GB RAM, etc.

And folks in that buying cohort are also no doubt wondering what will be going on with the Mac Pro. Some will hold up their buying plans (e.g. one new M4 Max/128 box now instead of M4/128 + M3 Ultra/500) until the next MP is announced.

picpicmac · Mar 11, 2025

Velin said:
For reference, running a base model M2 Max Studio w/ 32GB RAM, I downloaded DSeek R1, 70b at 43GB. ... So plainly the M2 Max base utterly and completely choked on 70 billion parameters. Had to control-Z that terminal.

Swap will kill any LLM because the model works for inferencing by a process going through the entire model (more or less). Even with the fastest internal storage will kill the LLM.

On your original question, from a video I found on YouTube yesterday, a graph was shown that compare the M4 Max 128GB performance on LLMs, with the M2 Ultra, a dual RTX 4090, and an A6000 I believe. The Macs were the only ones that could run the bigger models, but it lagged the Nvidia cards for the small models that could fit in their limited memory.

Models larger than 70b or so are likely not going to be usefully fast. They will run, but you'll get a couple of tokens per second. So if you just want to start a run and go have lunch, then maybe you'll use the big models, but otherwise probably not.

And that is why I'll likely skip the M3 Ultra because the extra price for 256GB or 512GB still won't deliver useful speed for LLMs.

For diffusion models, however, the large number of GPU cores will likely still be useful. However, the M4 Max with 40 GPU cores is still going to be as fast as the 60 core M3 Ultra. You'd have to shell out the extra $$$ for the 80 core version.

Like others here, I'm most likely to settle for the 128GB M4 Max.

PS: Here's the video I referenced:

JSRinUK · Mar 11, 2025

I’ve just received an update saying my Studio has been dispatched, and I have a UPS tracking link. The link shows it being in Shenzhen, China on the 10th, and it’s on the way (Chek Lap Kok, Hong Kong) as of a moment ago. The estimated delivery is shown as Thursday, 13th (by the end of the day).

picpicmac said:
Models larger than 70b or so are likely not going to be usefully fast. They will run, but you'll get a couple of tokens per second. So if you just want to start a run and go have lunch, then maybe you'll use the big models, but otherwise probably not.

And that is why I'll likely skip the M3 Ultra because the extra price for 256GB or 512GB still won't deliver useful speed for LLMs.

That was one of my deciding factors against going for the M3U with 256GB (okay, and the massive price! 🤣 ). I’ve been watching Alex Ziskind on YouTube and he’s done comparisons - including with 128GB M4 Max on a MBP. While you can load up 70B models, they’re not always going to be all that quick. Tolerable, acceptable even, but if you want speed, probably not so much.

Seems to me there’s a law of diminishing returns with Macs. More RAM will enable you to load larger models, but speed will suffer. I think it’ll probably do fine with some 34B models I’ve downloaded.

I went with the 128GB model because:

(i) I will occasionally run 70B models (just because I can);
(ii) I’ll run smaller models, alongside Stable Diffusion for image generation, and sometimes also have Parallels running Windows 11 for small projects. This, combined, is all too much for my humble 24GB MBP;
(iii) I want to run LLMs with larger context lengths (I’d like it to summarise full scenes/chapters of the story I’m writing without me having to break them into smaller “acts”);
(iv) I’ve been playing about with ComfyUI for video generation with WAN and, on my 24GB MBP, it sucks up swap space like it’s going out of fashion - like it’s trying to use 60GB (RAM + 40GB SWAP) or something. It does work on the MBP, but it kind of takes over a full day for a 5s clip. I’d like to continue experimenting with things like this without tying up my entire machine for a day each time.

I think 128GB RAM will help towards all of the above.

Daydog · Mar 11, 2025

JSRinUK said:
I’ve just received an update saying my Studio has been dispatched, and I have a UPS tracking link. The link shows it being in Shenzhen, China on the 10th, and it’s on the way (Chek Lap Kok, Hong Kong) as of a moment ago. The estimated delivery is shown as Thursday, 13th (by the end of the day).

That was one of my deciding factors against going for the M3U with 256GB (okay, and the massive price! 🤣 ). I’ve been watching Alex Ziskind on YouTube and he’s done comparisons - including with 128GB M4 Max on a MBP. While you can load up 70B models, they’re not always going to be all that quick. Tolerable, acceptable even, but if you want speed, probably not so much.

Seems to me there’s a law of diminishing returns with Macs. More RAM will enable you to load larger models, but speed will suffer. I think it’ll probably do fine with some 34B models I’ve downloaded.

I went with the 128GB model because:

(i) I will occasionally run 70B models (just because I can);
(ii) I’ll run smaller models, alongside Stable Diffusion for image generation, and sometimes also have Parallels running Windows 11 for small projects. This, combined, is all too much for my humble 24GB MBP;
(iii) I want to run LLMs with larger context lengths (I’d like it to summarise full scenes/chapters of the story I’m writing without me having to break them into smaller “acts”);
(iv) I’ve been playing about with ComfyUI for video generation with WAN and, on my 24GB MBP, it sucks up swap space like it’s going out of fashion - like it’s trying to use 60GB (RAM + 40GB SWAP) or something. It does work on the MBP, but it kind of takes over a full day for a 5s clip. I’d like to continue experimenting with things like this without tying up my entire machine for a day each time.

I think 128GB RAM will help towards all of the above.

yes me to....

JSRinUK · Mar 11, 2025

Daydog said:
yes me to....View attachment 2490950

My tracking just changed to:

On the Way
Arrived at Facility
Dubai Airport, United Arab Emirates

Just to confirm, the correct thing to do now is to spend the next two days constantly refreshing the tracking details? Just want to be sure I’m doing it right. 🤣 🤣

shinseiromeo · Mar 11, 2025

I'm wondering with all this tariff nonsense, could it cost you more to pre-order and have it send direct to you from China instead of buying it locally in the US once it's stocked locally. Do you have to pay a custom fee now?

jouster · Mar 11, 2025

Mine shows as being located in the “Awaiting Spousal Approval Holding Area” with a status of “Don’t hold your breath”.

Fiona FTW · Mar 11, 2025

shinseiromeo said:
I'm wondering with all this tariff nonsense, could it cost you more to pre-order and have it send direct to you from China instead of buying it locally in the US once it's stocked locally. Do you have to pay a custom fee now?

No. There’s always been a tariff. It’s just more now. Apple prices the tariff (also VAT where that’s a thing) into the cost of the machine. If the new tariff becomes more than Apple can chew it will raise the MAP.

JSRinUK · Mar 11, 2025

Changed again ...

On the Way
Departed from Facility
Dubai Airport, United Arab Emirates

I’ve been trying to figure out what monitor to buy to go with it. I’ve been using my old 1080p monitors with my MBP and, well, they’re not good (thankfully the MBP has its own screen so the monitors are just for putting things on I just glance at). If they ever update it, I’m sorely tempted by an ASD - but it’s well out of my budget right now (that all went on the Studio).

And there seems to be a lot of differing opinions as to what makes a good monitor, how many aren’t great “because they’re 4K, not 5K”, or they flicker, or some scaling issues with Mac. It’s all a bit of a nightmare.

So I took a plunge and bought the BenQ PD2705U 27” 4K Monitor for Mac from Amazon. It’s just arrived and it is, of course, so much better than my 1080p monitors. I’m a simple guy, and I don’t know what all the other issues are with monitors and Macs, but this one seems to work. Has a handy-built in KVM so I can use it with the Studio and my MBP without too many unnecessary cables and boxes lying around.

picpicmac · Mar 11, 2025

JSRinUK said:
I’ve been watching Alex Ziskind on YouTube and he’s done comparisons - including with 128GB M4 Max on a MBP. While you can load up 70B models, they’re not always going to be all that quick. Tolerable, acceptable even, but if you want speed, probably not so much.

Ziskind is one of the more prolific "content creators" for LLMs on Mac, but even he's in the hawking-brands business.

Regardless, I've concluded that what I wish to do locally still is not possible. At least without buying an 8-bay H200 rack, which besides the cost (around $333k) is noisy, can heat a whole house, and cost $$ in monthly electricity.

Yet I will still be able to make good use of the 32b models, I think.

And I also do diffusion models, or try to on my old Intel iMac, and while the 80-core M3 Ultra is ideal the 40 core M4 Max will likely work well.

Also for AI and audio, which fortunately is not as demanding as video.

JSRinUK · Mar 11, 2025

Apparently mine is now :

On the Way
Arrived at Facility
Koeln, Germany

This’ll be one of those scenarios where it travels around the world in a few hours, and then gets stuck at the last stop for a week. 🤣

shinseiromeo · Mar 11, 2025

JSRinUK said:
Apparently mine is now :

On the Way
Arrived at Facility
Koeln, Germany

This’ll be one of those scenarios where it travels around the world in a few hours, and then gets stuck at the last stop for a week. 🤣

Do you mean Köln? I haven't seen written with an 'e' before!

Random ask here, I'm considering buying the m4 max though unsure of what custom specs yet. Anyone moving on from an m2 max and planning to sell it or trade it?

Velin · Mar 11, 2025

shinseiromeo said:
Random ask here, I'm considering buying the m4 max though unsure of what custom specs yet. Anyone moving on from an m2 max and planning to sell it or trade it?

Bunch of us are contemplating it. Still looking for solid benchmarks. 16/40 is a no-brainer; 128GB RAM is pricey. But the 12/40 128GB M4 Max looks a machine that could last anyone a good long time.

JSRinUK · Mar 11, 2025

shinseiromeo said:
Do you mean Köln? I haven't seen written with an 'e' before!

I copy & pasted from the website. If it’s spelled wrong, that’s on UPS.

shinseiromeo · Mar 11, 2025

Velin said:
Bunch of us are contemplating it. Still looking for solid benchmarks. 16/40 is a no-brainer; 128GB RAM is pricey. But the 12/40 128GB M4 Max looks a machine that could last anyone a good long time.

My issue is compounded... I don't "need" a new m4 studio. I currently have three rigs set up... one a Win10Pro media server for my home, a 'mega rig', which is a custom built Ryzen 5950x, 64GB ram, 4090 Founders, and an aging 2017 fully specced iMac.

Truly I could get by with a mac mini m4 pro with a good hard drive and be fine. I just like toys and always wanted a studio. I do get a partner discount from apple and would have to custom built the m4 to at least 1TB of nvme, though the 'apple tax' kills me when I pay for drives like the samsung 990 Pro 4TB for $300, yet apple has the audacity to charge MORE for a worse/slower drive! If the m4 studio started with 1TB for the $1999 base price, I probably would have already ordered it.

Chancha · Mar 11, 2025

JSRinUK said:
I copy & pasted from the website. If it’s spelled wrong, that’s on UPS.

The umlaut (the two dots) above a vowel indicates adding an "e" sound after it. But since they are not standard ASCII keyboard characters, it is understood and accepted to type "oe" in place whenever typing ö is not possible. So Köln or Koeln are both correct ways to type it

PS: while "Koln" is, well, wrong. But you can consider it like an alternative spelling in English, if you don't go as far as "Cologne".

Canadia69 · Mar 11, 2025

shinseiromeo said:
My issue is compounded... I don't "need" a new m4 studio. I currently have three rigs set up... one a Win10Pro media server for my home, a 'mega rig', which is a custom built Ryzen 5950x, 64GB ram, 4090 Founders, and an aging 2017 fully specced iMac.

Truly I could get by with a mac mini m4 pro with a good hard drive and be fine. I just like toys and always wanted a studio. I do get a partner discount from apple and would have to custom built the m4 to at least 1TB of nvme, though the 'apple tax' kills me when I pay for drives like the samsung 990 Pro 4TB for $300, yet apple has the audacity to charge MORE for a worse/slower drive! If the m4 studio started with 1TB for the $1999 base price, I probably would have already ordered it.

3 weeks ago i ordered a M4 Pro Mac Mini with 1tb and 64gb for 3500 CAD…thankfully i cancelled my order before it shipped after discussing it on this forum and realizing the soon coming mac studio would be a better value. And im glad i did, i paid 4050 for the m4 max 16/40 cores 1tb and 64gb and it is a better value especially considering it has a much better chip 10gb Ethernet and much more ports (even though i dont really need those for now)

Not an expert but I think if u get a base mac mini and maybe upgrade the storage urself, its a good value but as soon as you choose the m4 pro and start specing it up, its start getting very close to m4 max mac studio

JSRinUK · Mar 12, 2025

Chancha said:
The umlaut (the two dots) above a vowel indicates adding an "e" sound after it. But since they are not standard ASCII keyboard characters, it is understood and accepted to type "oe" in place whenever typing ö is not possible. So Köln or Koeln are both correct ways to type it

PS: while "Koln" is, well, wrong. But you can consider it like an alternative spelling in English, if you don't go as far as "Cologne".

However you spell it, my Studio is now in the UK and the delivery has changed to today - where it previously said tomorrow. I’ll be hoping for an early finish from work today.

JSRinUK · Mar 12, 2025

“Out for delivery”, yay!

Pity I’ll be at work when it arrives.

Airmk · Mar 12, 2025

Mine is out for delivery too!

Didn't even know you could get delivered this quick, China to everywhere in less than 48 hours still sounds wild to me

JSRinUK · Mar 12, 2025

I now have a delivery window, and it should be here within the next two hours. Yay.

Daydog · Mar 12, 2025

Delivery between 10.10am and 12.10pm today uk.

M4 Max Studio - preparing to ship, yay!

macrumors 68020

macrumors 6502a

macrumors 68020

macrumors 601

macrumors 68000

macrumors 6502a

macrumors member

macrumors 6502a

macrumors 6502

macrumors 68000

macrumors regular

macrumors 6502a

macrumors 68000

macrumors 6502a

macrumors 6502

macrumors 68020

macrumors 6502a

macrumors 6502

macrumors 68030

macrumors 6502

macrumors 6502a

macrumors 6502a

macrumors member

macrumors 6502a

macrumors member

Our Staff