Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I luckily have a manual labor job, so AI won’t replace me.

The only way “AI replaces” everyone is if every business says “yeah”. That’s not the real world.

It would also mean everything become ****. These AI models are speedy at times but everything they do is a big step down from an experienced professional worker. They even do things that a beginner junior worker would get fired for.
 
Just give it a go.

Install Ollama, learn how to load the models you can't simply select in the GUI, and see what happens.

Like I said earlier, I've just spent a lot of time evaluating these things, and what ChatGPT tells you might not always match up with what reality is when you try it yourself. But nothing will break your machine.

Worse case scenario is that it lags and it takes forever for Ollama to produce a result. Then you know, and you try a smaller model. And once you've dropped down to the size of models that your machine can handle, then it's smooth sailing as far as speed.

Then you need to evaluate if those models actually can consistently produce good enough results for what you need. And that's a very personal type of a thing that you just have to explore yourself.

In my case I needed javascript code for a specific environment, and they weren't good enough to produce full solutions from prompts wanting a wider type of solution requiring several multi-step things to fit together. But being a programmer myself they'd still be good enough to speedier get individual parts of the solution that I need. So I'll be happy working locally for some of the more sensitive parts, while I'll still use third-party/online models for more generic type frameworks etc.
LM Studio is way more user friendly and have supported MLX models for over a year. OLLAMA is a waste of time
 
Then you need to evaluate if those models actually can consistently produce good enough results for what you need. And that's a very personal type of a thing that you just have to explore yourself.

That's the key - what works for some won't for others.

The only way “AI replaces” everyone is if every business says “yeah”. That’s not the real world.

It would also mean everything become ****. These AI models are speedy at times but everything they do is a big step down from an experienced professional worker. They even do things that a beginner junior worker would get fired for.

Exactly. I am working on a local model that does very specific analysis based on a very defined data set and logic model. It is basically an entry level analyst that an experienced professional can use for some initial insight.
LM Studio is way more user friendly

How so?
 
The only way “AI replaces” everyone is if every business says “yeah”. That’s not the real world.

It would also mean everything become ****. These AI models are speedy at times but everything they do is a big step down from an experienced professional worker. They even do things that a beginner junior worker would get fired for.
I am a glorified bag boy, people come in order, their livestock feed and I pick and load it. A robot might replace me someday, but not any day soon.
 
More than 32 Gig? Does that mean 32 Gig is not enough?
Its worded badly. 32 and up would be accurate. The model is 24GB.
Somebody needs to embrace "or equal to" or "at least".
EDIT: turns out that's the exact phrasing at Ollama… so… problem isn't at @MacRumors, though would have been nice if they'd reached out to Ollama and clarified. Seems like Ollama is implying that 32GB is NOT enough, however, contra @Tim Jobs the 2nd.

From Ollama on X, replying to "More than? Why if I have a 32gb Mac mini exactly?":
"it'll work (just have to watch out for other apps taking away available memory)"

So sounds as though 32GB may be more of a "YMMV" situ.
 
Last edited:
  • Like
Reactions: rishey
LM Studio is way more user friendly and have supported MLX models for over a year. OLLAMA is a waste of time
That depends on the user, there are no absolutes here.

LM Studio is more user-friendly for beginners, though, making Ollama something that probably is more comfortable for more technical users (as in techies, not just prone to slip into "I'll let you know that I've owned a computer since before…"-rants whenever they can't find a setting on their phone).
 
  • Like
Reactions: Ed68
128gb M5 Max MBP on way and i’ll also buy M5 Ultra when available. Local is huge for my research combining linked data and LLM informed knowledge graphs. Local LLM/LRM/multi-agent are going to be huge and Apple is silently winning the AI war via hardware. These are exciting times.
 
Just bought an M5Pro 48Gb machine. Might just have a go at this. Does anyone have any experience of using it locally?

Claude tells me it QWEN 2.5 would be good for Xcode completion but nowhere near as good for complex requests.
I just bought an M5 Pro/48GB specifically for local LLM work. It was available at my Apple Store and I didn't want to wait a month for a custom 64GB build - although it's just $200 more!

I tested against my old M3 Pro machine, token generation speed is double (thanks to double memory bandwidth compared to the "crippled" M3 generation) and time to first token is ~20% of M3. This is where the neural accelerators in GPUs shine.

M5 is a true game changer for the local AI work "on the cheap" with standard HW (compared to other options).
 
So much for all those folks that ignore the fact that RAM needs always increase every year and claim that "16 GB RAM is plenty..."

Yes even 8 GB is enough if one's intent is never to do anything but run Office apps forever.

Personally I always configure with the max RAM available, and I have never had a Mac end its (5-7 year) life cycle still with available RAM overhead. Others choose short life cycles, and that is OK too. Just do not choose to buy low RAM and then complain about the resultant short life cycle.
 
Last edited:
  • Disagree
Reactions: freedomlinux
A powerful enough Mac costs thousands and open source models like Qwen are - at least at the time being - nowhere near as potent as ChatGPT and Claude. Additionally I am really skeptical about the "privacy" thing of these Chinese models, unless they can really run without internet connectivity.
I see where you’re coming from, but many use cases don’t really need a big 30+ billion parameter model, plus of course over time the cost of compute power to run the more powerful models will come right down.

Also gotta say that you definitely cannot trust ANY of these cats with your privacy, no matter the nation they originate from. But local models don’t need to connect to the network, and you can monitor the network traffic to double check.
 
  • Like
Reactions: Ed68
As someone who downloads and experiments with everything possible…

There is a lot of delusion in this thread. Local language models below 100 billion parameters are quite useless. Even 100 billion parameters is considered the weak side. Fun to play with for a while but boredom and frustration sets in quickly.

So what happens is they want the next model…and then the next one…and then the next one…falsely believing their 16GB or 32GB machine will one day have the holy grail of small and powerful local language model.

But it doesn’t happen. The models keep growing and aside from being memory hungry the most important thing that makes them useable is memory bandwidth.

The top 5 language models in the world are all over a trillion parameters and what makes them useful and responsive is that they respond quickly and have GPU with over a terabyte of bandwidth.
Use case matters. QWEN 3.5 27B is absolutely usable for content summarizing to help you filter content and find more signal among the noise. There are also lots of recent posts about its ability to follow instructions for agentic workflows.

I use Phi 4 (4-bit) to clean up the grammar on my MacWhisper dictations so that everything stays local on my machine.
 
  • Like
Reactions: Sully
I recently evaluated some of these models for local use, and at least in my context they just weren't nearly good enough to straight up replace what you can buy access to. It's simply not the same as ChatGPT or Claude, but run for free.
I'm sure some people have genuine use cases, but for me, running them locally definitely doesn't compare to the big cloud models.... it's more about just playing with them and learning. At least that's been my experience on an M4 Pro, although I have read that the M5's are a significant upgrade in this department.
 
  • Like
Reactions: Populus
May I ask why more than 32GB of RAM is required, and whether a 32GB of RAM Mac will make the cut or not?

Honestly, before the memory crisis and the war effects impact Apple, I’m trying to buy what I need and I think we’ll need for the future… and after testing a few LLMs on a 16GB M5 machine, the experience is not good at all (also I’m a noob, maybe having things better configured and running the right models would solve this).

So, there’s a big difference between spending money on a 16GB M5, upgrading to a 32GB M5, or going all in and getting a 48GB M5 Pro, which is overkill for me on many tasks but if 32GB of RAM is not going to make the cut for local LLMs then I don’t know what to do.
 
Just bought an M5Pro 48Gb machine. Might just have a go at this. Does anyone have any experience of using it locally?

Claude tells me it QWEN 2.5 would be good for Xcode completion but nowhere near as good for complex requests.
I am VERY interested in your experience, as I’ve gauged whether to get an M5 Pro machine with this configuration… but honestly it would be overkill for me outside local LLMs so I’m not sure it’s the right choice for me (as someone who values its money).

Please let us know how it goes, and whether local LLMs hallucinate a lot with that much of RAM as well or not.
 
  • Like
Reactions: centauratlas
I am curious. When configuring a new Mac, what are the most important hardware parameters for running LLMs locally? RAM? Memory bandwidth (Max chips 50% better than Pro chips)? The latest chip? Internal SSD capacity?

Clearly base level chips with less available RAM and 1/3 the memory bandwidth of the Max chips should be avoided.
 
Last edited:
  • Like
Reactions: lapstags
I am curious. When configuring a new Mac, what are the most important hardware parameters for running LLMs locally. RAM? Memory bandwidth (Max chips 50% better than Pro chips)? The latest chip? Internal SSD capacity?

Clearly base level chips with less available RAM and 1/3 the memory bandwidth of the Max chips should be avoided.
I think that RAM and memory bandwidth are roughly equal initially. Once you get to the 64GB+ then bandwidth more so depending on the model. For larger models where you might want 256GB (or if they add it back in 512GB) and the higher bandwidth too.

But memory and bandwidth go neck and neck for large models. SSD not so much. Number of CPU cores, not as much. M5 GPUs matter more than M3 GPUs with the improvements. 🙂
 
May I ask why more than 32GB of RAM is required, and whether a 32GB of RAM Mac will make the cut or not?

Honestly, before the memory crisis and the war effects impact Apple, I’m trying to buy what I need and I think we’ll need for the future… and after testing a few LLMs on a 16GB M5 machine, the experience is not good at all (also I’m a noob, maybe having things better configured and running the right models would solve this).

So, there’s a big difference between spending money on a 16GB M5, upgrading to a 32GB M5, or going all in and getting a 48GB M5 Pro, which is overkill for me on many tasks but if 32GB of RAM is not going to make the cut for local LLMs then I don’t know what to do.
Ollama's announcement was using Alibaba’s Qwen3.5-35B-A3B model quantized (compressed) from 16-bits to 4-bits. That's a 20GB model, so 32GB of RAM will allow you to run a model with decent intelligence (depending on the task), and still have enough memory available for MacOS and other open apps.

Before buying a new machine, my advice is to test your use cases on Claude Haiku, ChatGPT GPT5.2 Instant, or Gemini 3 Fast. If your use cases don't work well on those models, it's going to be very hard to get them to work well on a local model with only 32 GB RAM.

If privacy isn't an issue, you're probably better off saving the money to spend on cloud tokens from either of the big three.
 
More than 32 GB of RAM? I thought we’d have a few more years before 32 was not enough. 😳 My studio only has 36 so is that barely enough?
 
“One generation” means nothing. If OpenAI open sourced ChatGPT today or if Anthropocene open sourced Claude you would need to spend at least 20,000 grand and up to over 100,000 grand to get the same experience you currently see in a browser. Hardware really matters and it is getting more and more expensive.

Mildly competitive local models are over 300 billion parameters, which means you need at least a $10K Mac Studio. That has the VRAM but it is still dog slow compared to cloud performance.

I agree that Mac hardware isn't a good choice.

You can get an array of 8 nVidia A100s for ~$20k second hand. Probably $25-30k all in for the server, and it will run all the latest open source models no problem. $25-30k is not nothing, but I know of clients that use more than that in tokens in a year. I think it is reasonable for them to have decided it's well worth it to bring some of that in-house.
 
Just tested the new Ollama MLX runner in CLI via terminal (0.19.0 preview) on my Mac mini M2 Pro 32GB, running qwen3.5:35b-a3b-coding-nvfp4.

it works, and it's noticeably faster than standard non-MLX models! Standard Qwen3.5 35b is not usable on my Mac, this one yes! It’s incredible!

What works well:
- The model loads and runs without issues on 32GB
- with /set nothink it’s blazing fast
- Token generation speed is much higher than equivalent non-MLX models
- RAM pressure stays in the green/yellow zone during normal use

Limitations:
- 32GB is the hard ceiling — the model itself takes ~20GB, leaving ~12GB for KV cache. Short sessions are fine; long context pushes into swap
- Can't comfortably use it as a backend for agentic frameworks (like OpenClaw) where the context grows large — hits the 32GB wall quickly

Bottom line: great for interactive chat sessions via ollama run, but this way it’s fun just for a session, not useful. For production agentic use or long context, you'd want 48GB+. Looking forward to seeing this on an M5 Max.
 
Last edited:
Ollama's announcement was using Alibaba’s Qwen3.5-35B-A3B model quantized (compressed) from 16-bits to 4-bits. That's a 20GB model, so 32GB of RAM will allow you to run a model with decent intelligence (depending on the task), and still have enough memory available for MacOS and other open apps.

Before buying a new machine, my advice is to test your use cases on Claude Haiku, ChatGPT GPT5.2 Instant, or Gemini 3 Fast. If your use cases don't work well on those models, it's going to be very hard to get them to work well on a local model with only 32 GB RAM.

If privacy isn't an issue, you're probably better off saving the money to spend on cloud tokens from either of the big three.
Thanks for the explanation.

Before all of this LLM madness I would have bought an M5 with 24GB of RAM, I know plenty for me (I’d even have bought a 12GB Neo if Apple had used an A19 SoC).

But now, I try not to make my machine obsolete before buying it (for local LLMs, either third party or a future model from Apple). So I guess an M5 with 32GB of RAM is the sweet spot.
 
  • Like
Reactions: erthquake
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.