Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
That was my first thought. Next year can be very different from next spring.

I kinda hope it’s delayed again, because I think that would be funnier than new Siri would be useful. And I could use a good laugh in this bleak world.
Oh, I hope not. Although, by the way, Apple has screwed this up. I wouldn’t be surprised if it were December 31, 2026 at 11:59 PM.
 
Has anyone in the world been able to do a streaming voice model that small?
In terms of OpenAI's models, Whisper:Large (2B[illion Parms]) is 6GB. Whisper:Medium (0.8B) is 3GB. Whisper:Small (0.2B) is 967MB. Whisper:Tiny (37.8M) is 151MB

One can guess Apple can easily use a model around the 1-3GB mark swapping it in and out as required.

I think most stuff is done as a chain anyway. VoiceRecognition gets it 80% and some LLM fixes up the missing and misheard words (80% of the time, anyway). So a small or tiny model may well be more than enough, along with a small (or heavily quantised) LLM to back it up.

Still to revamp siri Apple either need to (1) boost the iPhone mmmory to run more models of greater size.
or they need to (2) release a home hub of sufficient power (or allow people to link their personal NVidia sparks, or similar)
or they can (3) leverage OpenAI, Anthropic, Gemini, Grok, Perplexity, Meta, whomever and leak all that tasty training data.

I personally dislike the thoughts of screen context aware as we know they will pick option 3, and with siri always mistakenly waking, that's a send of your screen to the commercial provider.

However, I also have a low tolerance for commercial AI in general now as it shifts to advertising and sentiment manipulation. It makes too many duck ups and there is the whole privacy issue to boot. Of course, I recognise some people find it incredibly useful. Probably the bulk. As long as I can cut off siri completely come that day, all's good. Options are always welcome for those who trust AI and those who don't wish to use it.
 
Oh man, what if the home central command device Apple is making comes with its own private AI server personalized with your data and all processed locally to keep up their privacy promises. They could repurpose their “back to my Mac” functionality to VPN your devices’ AI requests to your home server.
 
In terms of OpenAI's models, Whisper:Large (2B[illion Parms]) is 6GB. Whisper:Medium (0.8B) is 3GB. Whisper:Small (0.2B) is 967MB. Whisper:Tiny (37.8M) is 151MB

One can guess Apple can easily use a model around the 1-3GB mark swapping it in and out as required.

I think most stuff is done as a chain anyway. VoiceRecognition gets it 80% and some LLM fixes up the missing and misheard words (80% of the time, anyway). So a small or tiny model may well be more than enough, along with a small (or heavily quantised) LLM to back it up.

Still to revamp siri Apple either need to (1) boost the iPhone mmmory to run more models of greater size.
or they need to (2) release a home hub of sufficient power (or allow people to link their personal NVidia sparks, or similar)
or they can (3) leverage OpenAI, Anthropic, Gemini, Grok, Perplexity, Meta, whomever and leak all that tasty training data.

I personally dislike the thoughts of screen context aware as we know they will pick option 3, and with siri always mistakenly waking, that's a send of your screen to the commercial provider.

However, I also have a low tolerance for commercial AI in general now as it shifts to advertising and sentiment manipulation. It makes too many duck ups and there is the whole privacy issue to boot. Of course, I recognise some people find it incredibly useful. Probably the bulk. As long as I can cut off siri completely come that day, all's good. Options are always welcome for those who trust AI and those who don't wish to use it.

Whisper is just Speech-to-Text model, not an integrated streaming multimodal LLM like GTP-4o. Integration with LLM is critical for latency and being able to provide robust conversation and context understanding.

Qwen3-Omni could be along the lines of what is needed. When asking the model, it claims needing 6GB memory:
This model is 1.8 billion parameters and requires about 6 GB of GPU memory to run.
If true, that would be perfect for iPhone 17 Pro/Air that have 12GB RAM.

Check attachment for voice quality.

I would not be surprised to learn that Apple would be using Qwen model for Siri in Chinese market and that this particular model is trained as a demo for Apple.
 

Attachments

  • qwen3-omni-memory-requirements.m4a.zip
    113.8 KB · Views: 2
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.