LARGE Language Model are, by definition, LARGE!
Yes, training them is a massive task, but they remain massive even at that point. GPT3 takes 800GB to store!
Now, once a model works, you can try to optimize its size in various ways, like pruning and quantization. This was done for vision models a few years ago, and we got image recognition on our phones. But consider that the "mainline" model Apple uses for imagery (there's a single model with different "heads" that does everything from the camera-side of taking photos to recognizing people in photos to handling image search) is about 30MB in size. That's a HUGE difference...
How much would Apple be willing to sacrifice in storage for a really good language model? Maybe, I don't know, 10GB of storage and 1GB of RAM? GPT3 is nowhere close to that, and while (perhaps) it could be shrunk to that point, right now, server execution is the only option.