For GPT, they need something called a LLM (Large Language Model). Think of this as a massive snapshot of the internet at a given point in time. Once that snapshot is generated, they can copy it locally to the user. So when you ask it something, it checks the local LLM (on device) and gets you the answer.
So what's the point of the AI server?
1. Generating those models takes a LOT of GPU power for a long period of time. This is what all these companies are paying the big bucks for.
2. Those snapshots get stale quickly, so they need to run a new model every so often. It can cost billions to generate a new model so you don't want to do it too often.
3. As they improved their code, they need to regenerate parts or all of those models.
So does that mean if the snapshot is from Jan 1 I won't get any recent data?
Yes.
To fix that, a lot of the GPTs will leverage the internet to get more up-to-date results. That's slower and the results aren't as good. This part will not be handled on-device obviously.
This is why Apple is trying to strike a deal with Gemini and OpenAI. It's also possible that Apple is looking to handle the fresh data on their own, which would require more servers.