Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster


Ollama, the popular app for running AI models locally on a computer, has released an update that takes advantage of Apple's own machine learning framework, MLX. The result is a hefty speed boost on Macs with Apple silicon.

ollama-logo-mac.jpeg

According to Ollama, the new version processes prompts around 1.6 times faster (prefill speed) and nearly doubles the speed at which it generates responses (decode speed). Macs with M5-series chips are said to see the largest improvements, thanks to Apple's new GPU Neural Accelerators.

The update also includes smarter memory management, which should make AI-powered coding tools and chat assistants feel noticeably more responsive during extended use.

Ollama says the new performance boost should especially benefit macOS users who run personal assistants like OpenClaw or coding agents like Claude Code, OpenCode, or Codex.

The preview release is available to download as Ollama 0.19 – just make sure you have a Mac with more than 32GB of unified memory to run it. Support is currently limited to Alibaba's Qwen3.5, but Ollama says support for more AI models is planned.

Article Link: Ollama Now Runs Faster on Macs Thanks to Apple's MLX Framework
 
I’m completely newbie here but would like to try local AI for documentation research and analysis, sort out files, translations.
Would Ollama be adequate ? And are more than 32 Go mandatory ?
 
I think this could be a major business for Apple - it’s way cheaper for a small business to buy a powerful Mac and run qwen 3.5 than pay for an enterprise license for a frontier model - and you don’t need to worry about privacy issues.
I recently evaluated some of these models for local use, and at least in my context they just weren't nearly good enough to straight up replace what you can buy access to. It's simply not the same as ChatGPT or Claude, but run for free.
 
  • Like
Reactions: Sully and .wojtek
I recently evaluated some of these models for local use, and at least in my context they just weren't nearly good enough to straight up replace what you can buy access to. It's simply not the same as ChatGPT or Claude, but run for free.
I wonder about that. I dont need deep reasoning but processing of minutes of meetings and strategic recommendations. Sonnet 4.6 is probably even overkill but I am not sure. Just wondering how far away I would be with a local model that fits into a 64GB Mac Studio.
 
I recently evaluated some of these models for local use, and at least in my context they just weren't nearly good enough to straight up replace what you can buy access to. It's simply not the same as ChatGPT or Claude, but run for free.
I tend to agree, but the rapid advancements do seem to be trickling into the LLMs much faster now. I currently run a spam filter that is using an older model and doing very well. I am going to do a bit more research on newer models with it. But it is already working an 95+% no intervention with very little reclassification of emails manually.
 
I recently evaluated some of these models for local use, and at least in my context they just weren't nearly good enough to straight up replace what you can buy access to. It's simply not the same as ChatGPT or Claude, but run for free.
I luckily have a manual labor job, so AI won’t replace me. As you are saying the cloud stuff (ChatGPT etc) are just going to be much better as the whole AI thing is moving through its toddler phase. Another couple years, the bubble should pop, things will settle down with two or three decent vendors.
Apple will end up doing well, seeing at every year they have new chips that see a 10-20% gain yoy, compared to Nvidia and AMD who have update cycles every two to three years. Nvidia is also probably pushing the market for acceptable cost. — Long term cost to run all this hardware is going to get insane, specially when we are now getting townships outright voting against data centers b/c of the cost increase to the local grids etc.
Apple is one of the few companies that builds their own power infrastructure when they add data centers / server farms.
 
I wonder about that. I dont need deep reasoning but processing of minutes of meetings and strategic recommendations. Sonnet 4.6 is probably even overkill but I am not sure. Just wondering how far away I would be with a local model that fits into a 64GB Mac Studio.
Time to talk to Claude or Grok about it. I am doing a lot in a 64gig M4 Max Mack book pro.
 
I’m completely newbie here but would like to try local AI for documentation research and analysis, sort out files, translations.
Would Ollama be adequate ? And are more than 32 Go mandatory ?
I’ve starting playing with LM Studio (free, open source) on my 24GB RAM M4 Pro MBP. LM Studio has a wide range of models available for download in MLX format as well; from tiny to gargantuan.

Currently messing with the default Gemini download. It’s neat you can ask it when it was compiled to get an idea of how current on events it is.

For example, I like to ask who the current US president is. The Gemini model tells me Biden because it was compiled in 2023.

Part of me wants to dig deeper but I’d have to upgrade my laptop and RAM isn’t cheap. Would be looking at 4K for a model with max RAM for AI.
 
Last edited:
I’ve starting playing with LM Studio (free, open source) on my 24GB RAM M4 Pro MBP. LM Studio has a wide range of models available for download in MLX format as well; from tiny to gargantuan.
Yeah I tried to get the one that was 22GB, but it caused my MacBook to lock up. Not surprised since it even said probably incompatible. There is a 12GB model that is pretty good though.
 
I think this could be a major business for Apple - it’s way cheaper for a small business to buy a powerful Mac and run qwen 3.5 than pay for an enterprise license for a frontier model - and you don’t need to worry about privacy issues.
A powerful enough Mac costs thousands and open source models like Qwen are - at least at the time being - nowhere near as potent as ChatGPT and Claude. Additionally I am really skeptical about the "privacy" thing of these Chinese models, unless they can really run without internet connectivity.
 
rapid advancements do seem to be trickling into the LLMs much faster now.
Yes, but at the same time we're seeing that the hyperbole about AIs abilities is just that, hyperbole. Even the absolute top of the line stuff can't do what people thought would be easy within current generation stuff, and we're seeing limitations as far as training material.

In some contexts everything looks absolutely over the top amazing with that hockey stick curve of progress, but we're also seeing hard limits being hit way before we're even close to where things needs to be for practical everyday use.

As an example we have your positive experience of using it for spam filtering, which we need to put into context that I got way better numbers more than two decades ago with a framework that I built around a very small core based on a bayesian filter, with less computing power than probably your earbuds have today (or even just a USB-C plug).

It's similar to how things were and are with the crypto bros. If you want to see the amazing progress, opportunities, and revolution, it truly is there to be seen. But once you step out of the bubble it won't follow you into everyday life. You can escape it. You can live without it being an essential part of your life. You can get better results without it. Like with your spam filter example; it's more amazing within-bubble, than as a generic, practical, everyday solution.

There's no strict b/w hard line separating that inside and outside of the bubble, but things need to mature to a very high degree before it is a valid strictly better solution also outside the bubble. Which we can see in how most people in the world never ever would benefit from taking the plunge to making bitcoin and AI spamfilters their primary tools within their separate areas.

If you're into new tech the progress is amazing, but in everyday life where things just need to work it stops being about theoretical and occasional peaks, and it starts to become about how many times daily you can accept it failing you by not being good enough.

How long will your custom AI spamfilter feel amazing when you're relying on emails to always work, but that filter always gets one out of every twenty emails wrong?!
 
A powerful enough Mac costs thousands and open source models like Qwen are - at least at the time being - nowhere near as potent as ChatGPT and Claude. Additionally I am really skeptical about the "privacy" thing of these Chinese models, unless they can really run without internet connectivity.
The open source models are really good. They're about one generation behind the foundational models, but the foundational models were already really good last year and the improvements are more iterative than revolutionary at this point.

The main difference is cost. If you have enough volume to put the machines to work 24/7 (or close to it), then you can drive your token costs down to a 100th or even less of what the foundational models charge. Even if the model isn't quite as good, that's a cost trade-of plenty are willing to make.
 
As someone who downloads and experiments with everything possible…

There is a lot of delusion in this thread. Local language models below 100 billion parameters are quite useless. Even 100 billion parameters is considered the weak side. Fun to play with for a while but boredom and frustration sets in quickly.

So what happens is they want the next model…and then the next one…and then the next one…falsely believing their 16GB or 32GB machine will one day have the holy grail of small and powerful local language model.

But it doesn’t happen. The models keep growing and aside from being memory hungry the most important thing that makes them useable is memory bandwidth.

The top 5 language models in the world are all over a trillion parameters and what makes them useful and responsive is that they respond quickly and have GPU with over a terabyte of bandwidth.
 
Just bought an M5Pro 48Gb machine. Might just have a go at this. Does anyone have any experience of using it locally?

Claude tells me it QWEN 2.5 would be good for Xcode completion but nowhere near as good for complex requests.
Just give it a go.

Install Ollama, learn how to load the models you can't simply select in the GUI, and see what happens.

Like I said earlier, I've just spent a lot of time evaluating these things, and what ChatGPT tells you might not always match up with what reality is when you try it yourself. But nothing will break your machine.

Worse case scenario is that it lags and it takes forever for Ollama to produce a result. Then you know, and you try a smaller model. And once you've dropped down to the size of models that your machine can handle, then it's smooth sailing as far as speed.

Then you need to evaluate if those models actually can consistently produce good enough results for what you need. And that's a very personal type of a thing that you just have to explore yourself.

In my case I needed javascript code for a specific environment, and they weren't good enough to produce full solutions from prompts wanting a wider type of solution requiring several multi-step things to fit together. But being a programmer myself they'd still be good enough to speedier get individual parts of the solution that I need. So I'll be happy working locally for some of the more sensitive parts, while I'll still use third-party/online models for more generic type frameworks etc.
 
The open source models are really good. They're about one generation behind the foundational models

“One generation” means nothing. If OpenAI open sourced ChatGPT today or if Anthropocene open sourced Claude you would need to spend at least 20,000 grand and up to over 100,000 grand to get the same experience you currently see in a browser. Hardware really matters and it is getting more and more expensive.

Mildly competitive local models are over 300 billion parameters, which means you need at least a $10K Mac Studio. That has the VRAM but it is still dog slow compared to cloud performance.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.