Which M5 SoC should I choose for local LLMs such as Gemma4?

Populus · Apr 12, 2026

…without breaking the bank.

With the release of Gemma 4 and everyone going crazy with the 31B and 26B models, my hype for local LLM experimentation has rocketed again (after a frustrating experience with LM Studio on a 16GB base M5 weeks ago).

Seeing that local models keep getting better and better with smaller footprints, my plan is to get an M5 Mac to run them now and, eventually, run better models in the future, and use them to perform tasks locally with documents on my Mac.

That’s why I need a bit of help from you to choose the best Apple Silicon M5 SoC for local LLMs without breaking the bank.

The device doesn’t matter here, that’s why I chose the Apple Silicon subforum. It could be an M5 MacBook Pro, or the upcoming M5 Mac mini, so please don’t worry about that.

Option 1: M5 with 32GB of RAM. The safest one (for my pocket, at least).

Option 2: M5 Pro with 24GB of RAM. The “in between option” (I think it’s the worst, it is expensive and with less RAM than the first option).

Option 3: M5 Pro with 48GB of RAM. The expensive top end (at least for my budget, but I’m afraid it will be too expensive and fall out my budget).

I’m not opening a poll in the thread because I want reasoned answers. Otherwise most people would just click on the most expensive M5 Pro 48GB because, well, that’s the best SoC of all three, right?

I really want to know how worthy is it to pay the extra money to jump from one tier to the next, on our dear Apple price ladder…

By the way, I know the memory bandwidth of the M5 Pro is double the bandwidth of the M5, and that alone might be worth it, but I’m afraid most capable models such as the released Gemma 4 31B or 26B won’t fit into 24GB of RAM, so that would push me towards the extremely expensive M5 Pro 48GB RAM option… that may very well fall outside my budget (or create a big hole in it).

Thank you.

Darakian · Apr 12, 2026

I would go with either the base m5 w/ 32GB or the 48GB pro. For any practical programming the ram is going to limit you more than the speed. I've been playing with gemma4 on an m1 max w/ 64GB and I hit memory limits constantly. The `E4B` and `E2B` models run well enough so long as I keep the context window reasonably small.

Populus · Apr 13, 2026

Darakian said:
I would go with either the base m5 w/ 32GB or the 48GB pro. For any practical programming the ram is going to limit you more than the speed. I've been playing with gemma4 on an m1 max w/ 64GB and I hit memory limits constantly. The `E4B` and `E2B` models run well enough so long as I keep the context window reasonably small.

It’s not for programming but for general purpose, and information retrieval from my documents and being able to explain it to me.

I guess it’s not going to be worth it spending a huge amount of money on the M5 Pro if I’m not sure it’s gonna work for me…

Darakian · Apr 13, 2026

Have you profiled your tool at all to see what its hardware demands are?

Tr3m0r · Apr 13, 2026

you are looking at models at around 30 billion parameters so I would assume you need the most unified memory. I am not that knowledgeable about LLMs but would think that your desire to run them locally requires at least the size of the file you are putting into the ram. The question is how much more can your computer do when you are engaging with the program? You have the OS to account for as just an example of other resources utilizing your ram. Perhaps look into previous M series products, else consider smaller models.

Darakian · Apr 13, 2026

Tr3m0r said:
... your desire to run them locally requires at least the size of the file you are putting into the ram.

So much more. The m1 line doesn't support the smaller quantization formats, but I hit this the other day trying to load a 2.5MB project into gemma 4 (E2B iirc). Even just 100KB was hitting ~80GB of allocations.

Populus · Apr 14, 2026

I… just don’t understand what you’re saying. I’ve been told that 26B models, for instance, can fit easily in a Mac with 32GB of unified memory. And you’re saying a 2,5MB file takes 80GB of unified RAM memory?

deconstruct60 · Apr 14, 2026

Darakian said:
So much more. The m1 line doesn't support the smaller quantization formats, but I hit this the other day trying to load a 2.5MB project into gemma 4 (E2B iirc). Even just 100KB was hitting ~80GB of allocations.
View attachment 2622256

Is this dragging along previous 'conversations' with Gemma into this? Historical memory. There isn't huge incentive for these vendors to finely tune the memory management of these free cost to use versions versus the ones you pay for on features like contextual memory. If it is a memory hog and you don't have a large enough memory footprint to handle the load that just traffics more folks off to the 'cloud' for bigger memory capacities to 'rent'.

Gemma 4 2 system requirements.

" ... BF16 bits

gemma 4 (E2B ) 9.6 GB
..."

a 2x or 3x difference for a specific hardware platform would be kind of extreme. Practically 10x is more likely something else besides the model and project is being loaded.

The other issue how the system is set up. Is it for doing a single query for a single user at a time. Or trying to feed multiple concurrent queries. Multiple queries get multiple caches which is going to blow out memory much faster. Some of this boils down to whether want the query 'answer' or want "fast" query answers. Faster than human responses is not aligned with saving money on equipment costs.

Again some software is tweaked to draw that craving for magically fast response which is user additive. If looking for max 'crack high' then it tends to cost more.

Finally, yet another issue is what other 'tools' in the package are invokings ( image tokenizer. etc.) A 2.5MB project suggests text but what is being processed matters to memory footprint.

Darakian · Apr 16, 2026

Populus said:
I… just don’t understand what you’re saying. I’ve been told that 26B models, for instance, can fit easily in a Mac with 32GB of unified memory. And you’re saying a 2,5MB file takes 80GB of unified RAM memory?

I was trying to load in the 2.5MB as extra context for prompts to work with. Very specific use case, but it shows that ram usage can/will go above what is needed just to load in the model 🙂

Populus · Apr 16, 2026

Darakian said:
I was trying to load in the 2.5MB as extra context for prompts to work with. Very specific use case, but it shows that ram usage can/will go above what is needed just to load in the model 🙂

So according to you, getting an M5 with 32GB of RAM is not very useful to run local LLMs and use them to work with documents on my Mac? Anyways, I like to have extra RAM available so maybe it’s not a bad idea to get it

FlyingTexan · Apr 16, 2026

128GB. I'm on the M5 Max 128GB and I'd never go less. Wish they had 256. You could save going with the pro chip. But having been using it a lot lately I'm constantly hammering my 128GB.

keksikuningas · Apr 17, 2026

It's already dog slow on Mx Max chips, it will feel worse on a Pro chip. You might wanna consider getting a used M2/M3 Max MBP with 64GB+ memory instead if that's possible for you (it should be around the same price as the M5 Pros you are looking at , maybe even cheaper)

Populus · Apr 17, 2026

Dang… even with the new Google’s KV compression TurboQuant? 128GB of RAM sounds like way overkill for my other uses, and LLM is not going to be a priority. Just to be have the door open in the case I want to use future tools to help me with my local files (organizing them, summing them up, making mindnodes with them etc).

Honestly I was already set on getting an M5 chip because of the longer lifespan of the Mac. I don’t really want to get an older model.

Some MR pals like @CalMin already use their 32GB machines for actual local LLM, so I no longer know what’s going on here.

What I’m guessing is that the problem is the context, right? And long conversations. Those fill up the remaining available RAM.

I really wonder if, with the fast SSDs we have now like Gen 4 (M5) or Gen 5 (M5 Pro) nvmes, wouldn’t be possible to use it for certain needs such as storing some of the kv cache.

Anyway, I feel like more optimizations are possible, and maybe we’ll have them in the future, making 32GB of RAM enough for many local tasks (such as working with text documents and PDFs and organizing information). That’s where I think Apple’s approach to LLMs should go towards.

whg · Apr 18, 2026

I was playing around with the Gemma-4-E4B-it-Q8_0.gguf version using llama.cpp. On my M2 Air with 16BG RAM I got 20 tbs, on my M3 Max with 36GB I got 80 tbs. According to my tests the quality of this LLM is better than much larger models of only 1 year ago, such as DeepSeek-R1-Distill-Qwen-32B.i1-Q4_K_S, which was my best option so far with only 10 tbs on the M3.

If you are really serious about local LLMs you are certainly aware that only the upcoming M5 Ultra maxed out will be good enough, BUT for just playing around with the technology I think that the M5 Pro with 48GB RAM should be very good. Only you know what your needs really are.

Luftkopf · Apr 18, 2026

I run Ministral-3 with Ollama + Chatbox or Gemma3 12B in LM Studio on my M3 Pro with 18GB RAM. They run just fine, but if you're doing much else at the same time, that's getting bumped to swap, so for my usage, 32GB would be good, 48GB would be great.

Black_Mage · Apr 18, 2026

I was going to recommend that you get a good deal on an older Studio with lots of RAM from the refurb section of the Apple store, but for the first time that I can remember, ALL of the refurb Studios are sold out. This is nuts.

conmee · Apr 19, 2026

Lots of good feedback OP. Obviously, telling you to get a MBP Max with 128GB is a non-starter budget-wise. 🙂 Given your constraints, I’d prioritize (in general) max memory, max bandwidth, max GPU cores in that order when trying to build your configuration. Since you mention wanting M5 (and you don’t care about form-factor), you may want to wait for the updated Mac Mini since you’ll be able to get more memory for your money than buying a MacBook Pro, assuming you don’t care or need a laptop. Also given your limited use case, I’d say a Mac Mini with an M5 Pro (15-core CPU/16-Core GPU variant to save $200 vs the 18/20), and 32GB or ideally at least 48GB memory. With a 15-core M5 Pro you’ll have double the memory bandwidth of the base M5, 50% more CPU cores and 60% more GPU cores, and should provide a balanced platform for your uses.

Populus · Apr 19, 2026

conmee said:
Lots of good feedback OP. Obviously, telling you to get a MBP Max with 128GB is a non-starter budget-wise. 🙂 Given your constraints, I’d prioritize (in general) max memory, max bandwidth, max GPU cores in that order when trying to build your configuration. Since you mention wanting M5 (and you don’t care about form-factor), you may want to wait for the updated Mac Mini since you’ll be able to get more memory for your money than buying a MacBook Pro, assuming you don’t care or need a laptop. Also given your limited use case, I’d say a Mac Mini with an M5 Pro (15-core CPU/16-Core GPU variant to save $200 vs the 18/20), and 32GB or ideally at least 48GB memory. With a 15-core M5 Pro you’ll have double the memory bandwidth of the base M5, 50% more CPU cores and 60% more GPU cores, and should provide a balanced platform for your uses.

Yeah, I’m definitely waiting for the Mac mini, but I didn’t want to put the focus on the machine but rather the chip.

I know the M5 Pro is better, but sadly, the only RAM options it offers are 24GB and 48GB. And the jump between both configurations is $400, making a $1600 machine jump into $2000. And honestly between an M5 with 32GB of RAM, and an M5 Pro with 24GB, I think it will be more useful for me the first one with more available RAM

Also, as a reminder -again- this isn’t a machine for agentic use, vibe coding, image generation or anything like that. My local LLM usage is going to be experimental and aimed at organizing and handling/summarizing documents on my Mac. However, I don’t want it to hallucinate, I hate it when they make stuff up.

1BadManVan · Apr 19, 2026

Populus said:
Yeah, I’m definitely waiting for the Mac mini, but I didn’t want to put the focus on the machine but rather the chip.

I know the M5 Pro is better, but sadly, the only RAM options it offers are 24GB and 48GB. And the jump between both configurations is $400, making a $1600 machine jump into $2000. And honestly between an M5 with 32GB of RAM, and an M5 Pro with 24GB, I think it will be more useful for me the first one with more available RAM

Also, as a reminder -again- this isn’t a machine for agentic use, vibe coding, image generation or anything like that. My local LLM usage is going to be experimental and aimed at organizing and handling/summarizing documents on my Mac. However, I don’t want it to hallucinate, I hate it when they make stuff up.

Apple has a return policy. Buy the cheapest option first, test it out and see how it goes. If it doesn't work out, return it and upgrade. As you're finding out, everyones use cases and experiences are so different, you're probably never going to get the exact answer and use case you're looking for. Best answer is to just test this for yourself, make use of the hassle free return policy

conmee · Apr 19, 2026

Some final info for OP to ponder or at least have in one place.

14” MBP Base M5 32GB 1TB - $2,099
14” MBP M5 Pro (15/16) 24GB 1TB - $2,199
14” MBP M5 Pro (15/16) 48GB 1 TB - 2,599

Mac Mini Base M4 32GB 1TB - $1,399
Mac Mini M4 Pro (12/16) 24GB 1TB - $1,599
Mac Mini M4 Pro (12/16) 48GB 1TB - $1,999

Assuming Mini pricing more or less holds (or at least the deltas in price between various configurations remains the same even if base price rises). NOTE: M5 Pro also has 2X faster SSD which we forgot to mention. And includes N1 (WiFi 7, BT 6, TB 5 ports vs. Base with WiFi 6E, BT5.3, TB 4 ports). This is for MBPs, but Apple may continue to differentiate Mini M5 Base vs. Mini M5 Pro along these lines as well.

For current Mini, the $600 delta between Base+32GB vs. Pro+48GB does feel steep. You do get more cores, faster bandwidth, 50% more RAM, 2X SSD speeds (along with the faster connectivity/TB 5 ports), but that may not be enough to convince you (or your budget). Since the M5 Mac Mini won’t be out for perhaps a few months, maybe a side hustle to save up the $600? 😉 Given your use case, you may be able to get by with the Base M5+32GB and save some money. Give the Base M5 a test run during the return period to make sure you’re happy with the performance as @1BadManVan suggests. If it doesn't meet your needs, return it and bite the bullet on M5 Pro+48GB. The only downside I can see with the Base M5 (for you) is that even with 32GB you basically have zero headroom for anything that comes down the line the next 12-18 months (although I agree, if your budget maxes out at M5 Pro+24GB and you can't get the 48GB option, probably best to go with Base M5+32GB for the memory).

I ran a few benchmarks on mine and family members’ machines. These are real world results I got in home, not from reviews or test environments (although the results line up with what’s online). You get a substantial bump in graphical performance with the M5 Pro, but based on the GB AI benchmark, the results aren't as impressive (still, 29.8% faster GPU quantized). Yeah, these are synthetic results, so ultimately make a decision on how it runs your workload in real life. Good luck!

GB6 (Single/Multi): Base M5 (4196/17872) / M5 Pro 15-core (4252/25998)
GB6 (OpenCL): Base M5 (48827) / M5 Pro 15-core (76103)
GB6 (Metal): Base M5 (76609) / M5 Pro 15-core (121205)

GBAI (CPU-Quantized): Base M5 (6929) / M5 Pro 15-core (7006)
GBAI (GPU-Quantized): Base M5 (24464) / M5 Pro 15-core (31756)

Black Magic (Write/Read): Base M5 (6551MBs/6262MBs) / M5 Pro 15-core (12167MBs/12493MBs)

thebart · Apr 27, 2026

I think you need to go to a local LLM forum or subreddit and ask for experience about the specific model you are hoping to run

conmee · Apr 28, 2026

I ran the same test MaxTech ran on a 14” M5 Pro with 24GB. Ollama with qwen3.5 35b parameters and 32k context length. “ollama run qwen3.5:35b whats the fastest laptop in the world -- verbose” In MaxTech's video, the 14” results took 5m34sec to run with eval rate of 7.67 tokens/s. My 16” M5 Pro with 64GB in clamshell took 57 seconds to run with eval rate of 41.45 tokens/s. It would seem more memory makes a big difference (but MaxTech also used the 15-core/16-core CPU/GPU so that also had an effect on results). Wonder what an M5 Max with 128GB gets for the same test. As @thebart says, probably need to identify specific model/models of interest and then talk to folks running those models. Based on Geekbench AI and one Ollama test, I'd say an M5 Pro with 64GB is faster than a binned M5 Pro with 24GB... but that's pretty much stating the obvious. 🙂

Which M5 SoC should I choose for local LLMs such as Gemma4?

macrumors 604

macrumors newbie

macrumors 604

macrumors newbie

macrumors newbie

macrumors newbie

macrumors 604

macrumors G5

macrumors newbie

macrumors 604

macrumors 65816

macrumors regular

macrumors 604

macrumors regular

macrumors 65816

macrumors 6502a

macrumors 6502a

macrumors 604

macrumors 68040

macrumors 6502a

macrumors 6502a

macrumors 6502a

Our Staff