This is all new to me, as I’m just experimenting.
I can run gemma2:27b, which is 16GB, on my MBP with 24GB. It can sometimes take a little time to come up with a response, but it’s generally quite usable.
However, llava:34b, which is 20GB, is totally unusable.
My take from this is that 32GB RAM *should* run the 20GB model roughly the same as the 16GB model runs on my 24GB machine.
I’m of the mind that it’s mostly all about the RAM. It’s possible (probable?) that the Mac can’t give all of its RAM to the model, hence the 20GB model is too much despite having 24GB RAM. Theoretically, the 20GB model in a 32GB Max should be okay. I don’t believe the bandwidth is the deciding factor - the RAM is. Likewise, a Mac with less RAM will probably do okay with the smaller models (there are plenty to play with). I’m definitely no expert here, though.
That all said, if I was to look seriously (or even semi-seriously) at LLMs or AI image creation, I’d also be looking at other things in addition to the RAM.
When the LLM is thinking (or when AI image generation is working), the M4 Pro chip is really put to work. Temperature goes up, the fans kick in, and my MacBook gets quite warm. It can also hammer the battery if you’re not plugged in (I was playing about with it before work this morning, and the battery went from 100% to the 20% warning in about two hours).
So, if I was to look at a machine to do this regularly, I’d probably look at something like the Studio with plenty of RAM - but that’s going to be real expensive, so I’d really need to *want* to do that. Maybe I’ll save up to get one in 2027.
For now, I’m content that my MBP can do what it can do (I love experimenting and pushing its limits) - but I won’t be doing much of it on battery power.