Some thing to discuss. Local LLM scene is growing. It causes Mac buyers to upgrade RAM, which Apple makes good margins in. Macs have unified memory and has an inherent advantage over discrete GPUs in LLM inference due to the larger capacity. However, Apple Silicon GPUs don't seem to have the compute to compete against RTX GPUs, or even AMD ones, in prompt processing and evaluation speed as long as the LLM model fits into the VRAM. This is because RTX GPUs have Tensor Core, which accelerate 8bit/4bit inference. Blackwell takes 4bit acceleration to the next level.
I hope Apple continues to invest in LLM support for their GPUs and add dedicated 8/4bit acceleration.
I hope Apple continues to invest in LLM support for their GPUs and add dedicated 8/4bit acceleration.