Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

VitoBotta

macrumors 6502a
Original poster
Dec 2, 2020
924
378
Espoo, Finland
A few days ago, I switched to MLX models supported by LMStudio and noticed a nice speed improvement compared to GGUF models. Recently, with the addition of speculative decoding in LMStudio beta, things have gotten even better—I’m seeing a 20 to 50% speed gain!

If you’re not familiar, speculative decoding is a technique that boosts inference for larger models by combining them with a faster, smaller model called the "draft." Interestingly, it seems that using speculative decoding doesn’t compromise quality compared to just using the main model alone. Pretty cool, right?
 
  • Like
Reactions: xodh and buggz
Thanks for the notice. Nice to see continued optimization for AS. Hopefully Apple puts dedicated resources into optimizing open source LLM inference on Macs like they do for Blender.
 
One thing I'm really looking forward to seeing is whether we'll be able to run the LLMs using the neural engine instead of the GPU sometime. I wonder if this would lead to better performance.
 
Yea, hey mods, this has nothing to do with Apple Intelligence. Apple Intelligence is Apple's ML tools. This is local LLM such as running deep seek using the Mac GPU. Move it back to Apple Silicon Macs.
 
I think there should actually be a sub forum for running AI/LLMs *on* the Mac.

I've asked for sub forums in the distant past and they've been created, probably worth a shot.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.