In terms of OpenAI's models, Whisper:Large (2B[illion Parms]) is 6GB. Whisper:Medium (0.8B) is 3GB. Whisper:Small (0.2B) is 967MB. Whisper:Tiny (37.8M) is 151MB
One can guess Apple can easily use a model around the 1-3GB mark swapping it in and out as required.
I think most stuff is done as a chain anyway. VoiceRecognition gets it 80% and some LLM fixes up the missing and misheard words (80% of the time, anyway). So a small or tiny model may well be more than enough, along with a small (or heavily quantised) LLM to back it up.
Still to revamp siri Apple either need to (1) boost the iPhone mmmory to run more models of greater size.
or they need to (2) release a home hub of sufficient power (or allow people to link their personal
NVidia sparks, or similar)
or they can (3) leverage OpenAI, Anthropic, Gemini, Grok, Perplexity, Meta, whomever and leak all that tasty training data.
I personally dislike the thoughts of screen context aware as we know they will pick option 3, and with siri always mistakenly waking, that's a send of your screen to the commercial provider.
However, I also have a low tolerance for commercial AI in general now as it shifts to advertising and sentiment manipulation. It makes too many duck ups and there is the whole privacy issue to boot. Of course, I recognise some people find it incredibly useful. Probably the bulk. As long as I can cut off siri completely come that day, all's good. Options are always welcome for those who trust AI and those who don't wish to use it.