Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Admiral

macrumors 6502
Original poster
Mar 14, 2015
458
1,081
I splurged and got the M4 Max Mac Studio with 128GB RAM. It arrived today, and after hours of waiting to migrate away from the M2 Mac Studio, it's finally up and running. While dicking around with LM Studio and LLaMa 3.3-70b, I idly set Handbrake to doing a software AV1 encode of Avengers Endgame today, expecting to see 70% improvement performance over the same encode on the M2 Max. Instead, I am seeing about double the framerate on the encode.

100% in two years is not too shabby.

This machine is astonishing. I'm really hoping an M4 Ultra or equivalent is not released this year, because I am planning to buy a car.
 
Last edited:
Did your M2 Mac Studio have 128GB as well? Because, really, with 128GB everything will knock your socks off :)
 
I’ve been using Qwen3 235B A22B on my Mac Studio.

Having 128GB just blows my mind.

To be fair, even 24GB on my MBP does most of what I want (I did some Handbrake when I first got it and it made my old Surface Pro 6 look like it fell out of the Ark), but loading up a 95GB LLM on a domestic computer is just crazy.
 

Attachments

  • Image 13-06-2025 at 4.29 pm.jpeg
    Image 13-06-2025 at 4.29 pm.jpeg
    26.5 KB · Views: 40
Pro Tip: When buying a high-end Mac, it pays to look for the small savings. Apple discounts returned products 7-10% in the Refurb Store, and gives active-duty military and veterans a 10% discount which can be used in the Refurb Store. Pick it up at an Apple Store in a neighboring state which has no sales tax, and one avoids another 8-9%. It really adds up — I got the Mac Studio for about 25% less out of pocket this way.

The thousand dollars bought my daughter a nice MacBook Air.
 
  • Like
Reactions: avkills
I splurged and got the M4 Max Mac Studio with 128GB RAM. It arrived today, and after hours of waiting to migrate away from the M2 Mac Studio, it's finally up and running. While dicking around with LM Studio and LLaMa 3.3-70b, I idly set Handbrake to doing a software AV1 encode of Avengers Endgame today, expecting to see 70% improvement performance over the same encode on the M2 Max. Instead, I am seeing about double the framerate on the encode.

100% in two years is not too shabby.

This machine is astonishing. I'm really hoping an M4 Ultra or equivalent is not released this year, because I am planning to buy a car.
Yeah the first time I set my m4 max running some handbrake work I thought it had crapped out because it finished so fast. I was coming from an M1 Pro, so the step was huge :D
 
I splurged and got the M4 Max Mac Studio with 128GB RAM. It arrived today, and after hours of waiting to migrate away from the M2 Mac Studio, it's finally up and running. While dicking around with LM Studio and LLaMa 3.3-70b, I idly set Handbrake to doing a software AV1 encode of Avengers Endgame today, expecting to see 70% improvement performance over the same encode on the M2 Max. Instead, I am seeing about double the framerate on the encode.

100% in two years is not too shabby.

This machine is astonishing. I'm really hoping an M4 Ultra or equivalent is not released this year, because I am planning to buy a car.
Just FYI this is due to the hardware decoder introduced on the M3. Not sure about astonishing.
 
Just FYI this is due to the hardware decoder introduced on the M3. Not sure about astonishing.

I'm aware there is a hardware AV1 decoder on the M4 Max CPU die. I'm not sure how that decoder logic has any appreciable effect on a compute-driven software ENcode.
 
What quantization are you using? I've been using IQ2_S on my MBP M4 Max 128GB with 10-15 tokens/sec. Wondering if I can get some better t/sec out of this model.
I’ve been using Qwen3 235B A22B on my Mac Studio.

Having 128GB just blows my mind.

To be fair, even 24GB on my MBP does most of what I want (I did some Handbrake when I first got it and it made my old Surface Pro 6 look like it fell out of the Ark), but loading up a 95GB LLM on a domestic computer is just crazy.
 
What quantization are you using? I've been using IQ2_S on my MBP M4 Max 128GB with 10-15 tokens/sec. Wondering if I can get some better t/sec out of this model.
I have both 3bit MLX (96GB) and Q2_K_XL (82GB).

Tokens/sec varies depending upon how much context there is, and how far into the chat you are.

I did post some test results over at: https://forums.macrumors.com/thread...formance.2456559/?post=33899083#post-33899083

A simple prompt may output around 30t/s, but when asking it to analyse a scene from my story with tasks (a scene around 5,000 words) it dropped to around 23t/s. As the context window fills, though, it’ll drop lower. Once it gets too low, I’ll generally start a fresh chat window. This is true of any local LLM, though - especially ones with context length of 128K or more. I find they become slow and unreliable as the context window goes over 50% full. If the context is necessary, I’ll tolerate down to 8t/s.

I’m not entirely convince that there’s a lot of difference between /think and /nothink for my use case.

I also run local LLMs on my 24GB MacBook Pro (M4Pro), such as Qwen_QwQ-32B-IQ2_S.gguf

When the LLM is responding, the temperatures on my MBP ramp up fast and go over 100C (if it’s on my lap, it becomes uncomfortable). I wouldn’t want to do this constantly/frequently, and I’m glad I have the Studio for better thermals (even that will get pretty warm after sustained use). I don’t know how an MBP with M4Max handles the temperatures.

Generally speaking I’ll run smaller models. It’s good that we can run Qwen3 235B A22B at 3bit, and even 32B models on a 24GB MBP, but it doesn’t leave a lot of spare capacity for doing anything else at the same time - sometimes I’ll be running AI image gen, and that also takes some resources (running that with LLM slows both down). Gemma3 (and Gemma2) are generally pretty good models.
 
I have both 3bit MLX (96GB) and Q2_K_XL (82GB).

Tokens/sec varies depending upon how much context there is, and how far into the chat you are.

I did post some test results over at: https://forums.macrumors.com/thread...formance.2456559/?post=33899083#post-33899083

A simple prompt may output around 30t/s, but when asking it to analyse a scene from my story with tasks (a scene around 5,000 words) it dropped to around 23t/s. As the context window fills, though, it’ll drop lower. Once it gets too low, I’ll generally start a fresh chat window. This is true of any local LLM, though - especially ones with context length of 128K or more. I find they become slow and unreliable as the context window goes over 50% full. If the context is necessary, I’ll tolerate down to 8t/s.

I’m not entirely convince that there’s a lot of difference between /think and /nothink for my use case.

I also run local LLMs on my 24GB MacBook Pro (M4Pro), such as Qwen_QwQ-32B-IQ2_S.gguf

When the LLM is responding, the temperatures on my MBP ramp up fast and go over 100C (if it’s on my lap, it becomes uncomfortable). I wouldn’t want to do this constantly/frequently, and I’m glad I have the Studio for better thermals (even that will get pretty warm after sustained use). I don’t know how an MBP with M4Max handles the temperatures.

Generally speaking I’ll run smaller models. It’s good that we can run Qwen3 235B A22B at 3bit, and even 32B models on a 24GB MBP, but it doesn’t leave a lot of spare capacity for doing anything else at the same time - sometimes I’ll be running AI image gen, and that also takes some resources (running that with LLM slows both down). Gemma3 (and Gemma2) are generally pretty good models.
@JSRinUK Looking through my chats, they are consistently at 10-15 tokens with the 256B model. I don't usually go that deep in the same chat though and I run it with max context. That's good to know about the token/sec and context length though.
Where did you get that 3bit MLX model? I have searched for it but can't find one that is 96GB.

I haven't tried no think since thinking as been a thing.

Yeah my MBP gets super hot and fans run a max speed when I run LLMs or even when I am running Windows games in Crossover. It also pulls the max 140 watts and I've even had my battery % go down while plugged in during a long LLM session. I'm not worried about it cause I got AppleCare lol.

@Admiral Have you tried Windows gaming yet on your Mac? It's extremely impressive.
 
@JSRinUKWhere did you get that 3bit MLX model? I have searched for it but can't find one that is 96GB.
I just got it within LM Studio. I looked for one I thought might push my RAM and tried it out.

It says its mlx-community/qwen3-235b-a22b. It also says “Memory consumption: 96.23GB”. Total downloaded size (20 parts) is a little over 102GB.

To be honest, I generally only use this model “because it’s there” (I want to push my Studio) but, in regular tasks, I’ll use something smaller - like a 27B Gemma 3 at Q8 - so that I can do other things (like AI image gen), and also because its faster. For my use-case, there isn’t really a massive difference between the two and, sometimes, a Q8 smaller model gives me a better response than that Q3 Qwen3 model. But the difference is marginal. On my MBP (that only has 24GB), I’ll use a 12B Gemma 3 Q4 (128K max context), or Qwen QwQ 32B IQ2_S. I’ve also run a 27B Gemma Q4 on there, but it starts pushing the RAM.

@JSRinUKI haven't tried no think since thinking as been a thing.
It often looks like the “thinking” is spurious to me. It’ll say it’s thinking about some things, it’ll get facts mixed up, but when it outputs its response, the facts are correct and the response doesn’t necessarily reflect the thinking.

@JSRinUKYeah my MBP gets super hot and fans run a max speed when I run LLMs or even when I am running Windows games in Crossover. It also pulls the max 140 watts and I've even had my battery % go down while plugged in during a long LLM session. I'm not worried about it cause I got AppleCare lol.
Battery life is definitely a thing. When I’m using my MacBook on battery, I use LLM very sparingly. When not using LLM the battery is massive, goes all day, and I barely notice it going down. When using LLM, I’ll get about 2-hours at most. I have an Anker battery pack to charge the MacBook when away from the mains, and that will charge it fairly quickly, but it won’t quite make it to a full charge. So, using LLMs away from a power outlet, I’ve got about three and a half hours at most. (I also have AppleCare+. I typically get AppleCare+ for anything that’s a bit pricey.)

@Admiral Have you tried Windows gaming yet on your Mac? It's extremely impressive.
I haven’t. I have Parallels, but that’s because there are some Windows apps that I haven’t yet entirely shaken - I still have my Surface Pro 6, so I don’t really need Parallels, but having everything available on the one machine is useful to me.

I’m not a gamer as such, but I do have an interest in seeing what my Mac Studio is capable of doing, so it’s something I’ll maybe look into when I have the spare time.

I have a couple of very old Windows games kicking around somewhere, so I may try and resurrect those. I don’t think they even run on my Windows 10/11 devices so they won’t do for Parallels either.
 
You still technically owe that tax to the state you live in.

Nobody pays it though.

I live in a state with no sales taxes. When the M1 Studios came out, the three stores near me were cleaned out of M1 Ultra models. I thought that there was a shortage of them but then I looked at the stores in the neighboring state that has sales tax and they had plenty of them.

I don't think that states are going to send their state police to camp outside of Apple Stores and check license plates in the parking lots to try to catch people in the act. The neighboring state used to do that for liquor sales. They had helicopters follow people from liquor stores with their license plates in my state to their state.
 
Nobody pays it though.
Of course nobody pays it. Businesses minimize costs by not paying taxes owed in our country as well as committing wage theft to what's estimated to be far into the billions. In the face of all that who is supposed to care when the average Joe drives across state lines to save a couple hundred bucks?

They had helicopters follow people from liquor stores with their license plates in my state to their state.
Good thing helicopters don't costs many hundreds to operate for every single hour they're in the air. Wait a minute!
 
Nobody pays it though.
Nope, which is why Massachusetts offers a tax free day in August.

I don't think that states are going to send their state police to camp outside of Apple Stores
They don't but what Mass staties do, is camp out near the fire works stores, take pictures of Mass cars, and wait for them to cross back over
 
Nobody pays it though.

I live in a state with no sales taxes. When the M1 Studios came out, the three stores near me were cleaned out of M1 Ultra models. I thought that there was a shortage of them but then I looked at the stores in the neighboring state that has sales tax and they had plenty of them.

I don't think that states are going to send their state police to camp outside of Apple Stores and check license plates in the parking lots to try to catch people in the act. The neighboring state used to do that for liquor sales. They had helicopters follow people from liquor stores with their license plates in my state to their state.
I used to live in Delaware. No sales tax. People would come from the neighboring states -- Maryland, New Jersey, Pennsylvania -- just to buy stuff at the Apple Store (there's only one in the state) tax-free. I don't think anyone was declaring the merchandise when they got back to their homes. When new phones were released... watch out! People would buy multiple phones at a time for friends and neighbors (and also likely to flip on eBay, etc.) along with themselves.
 
  • Like
Reactions: okkibs
Nope, which is why Massachusetts offers a tax free day in August.


They don't but what Mass staties do, is camp out near the fire works stores, take pictures of Mass cars, and wait for them to cross back over

That's a different crime, though. I saw a MapPorn chart this morning that said that MA residents spend $0.01 on average on fireworks, the lowest in the country.
 
Nope, which is why Massachusetts offers a tax free day in August.


They don't but what Mass staties do, is camp out near the fire works stores, take pictures of Mass cars, and wait for them to cross back over

One thing that some people do is get Best Buy in NH to price match Micro Center in Cambridge to avoid the sales tax.
 
  • Like
Reactions: maflynn
That's a different crime, though
Oh I know, but its silly to spend so much on money on something so trivial as fireworks. Its funny the state seems to take fireworks to an unhealthy level, yet most towns and local police forces take a blind eye
 
Oh I know, but its silly to spend so much on money on something so trivial as fireworks. Its funny the state seems to take fireworks to an unhealthy level, yet most towns and local police forces take a blind eye

Fireworks were easy to get in Chinatown Boston when I was a teen.

I don't know what the situation with them today is.

Costco used to sell fireworks but discontinued selling them in 2023. That tells you how mainstream fireworks are.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.