Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
70,402
42,034


OpenAI has updated the voice feature in its ChatGPT app so that voice conversations now happen directly inside an ongoing chat instead of forcing users into a separate voice-only session.

openai-chatgpt-logo.jpg

The change means responses now appear in real-time with text – plus any visuals like images and maps – as you speak, making switching between voice and text smoother while preserving chat history and context.

Previously, when you used "Advanced Voice Mode," voice chats opened in their own window, which would exit your current conversation and knock you out of your workflow. The update means voice and text interactions are now integrated in one seamless conversation thread.

Users who prefer the old separate voice mode, characterized by the floating orb, can easily revert back to it via Settings ➝ Voice Mode ➝ Separate mode. The option is available on both ChatGPT for web and mobile apps updated to the latest version.


The update is part of a batch of recent improvements to ChatGPT including group chats, the rollout of OpenAI's new GPT-5.1 model, and a shopping research feature for holiday gift finding.

Article Link: ChatGPT Voice Mode Now Works Inside Your Existing Conversation
 
Previously you could switch to voice mode but it was not obvious to the user that this was a separate session, and more critically that it was with a braindead version of the model.

ChatGPT voice was objectively terrible and their competitors do not do this. The fact that so many relied on this (it is pretty popular with GPT, not as much with others) is a little scary because it was so bad.

Good they finally fixed it. A more powerful and refined version is almost certainly going to be a core piece of Jony & Sam's hardware product in 1-2 years.

After watching the launch video their prosody technology needs significant work still, a bit disappointing. There is some incredible work being done there that is incredibly nuanced and emotive, and it's curious that either OpenAI doesn't have that technology adopted or they chose not to use it.


edit: The new Voice uses 4o regardless of what the chat is set to. Something else to keep in mind: you won't get the same quality as using text and/or forcing thinking.

edit 2: confirmed by the model itself

edit 3: Sol is a much better voice model than the default, still clipped though. Ridiculous you have to tie the level of effort and quality to a particular voice. OpenAI really should try a competitor once in a while to see what's possible. Even this new capability is easily a year or two behind some other things I've used.

There are smart people working there so they must be doing this because of the sheer scale and demands on their capacity, but if that's not the case the company has some deep issues because this will be a primary interaction mode soon and they are getting wiped by the competition.

full details: https://help.openai.com/en/articles/8400625-voice-mode-faq
 
Last edited:
Unless you are Elon, why this additional feature annoys you? I think its a good capability to have. Contrary not being able to switch from typing to voice mode was annoying.
Just like @DrJR writes and I totally agree: "It is a machine. ... It is a handy tool I never want to be my friend."

I find it very irritating that this very helpful tool gets a "personality". This just feels so fake and unreal and actually prevents me from using it. I wish to get things done more easily, not more or less. But without having the feeling of being emotionally manipulated.
 
  • Sad
Reactions: decypher44
edit: I tested this and the new Voice uses 5.1 non-thinking regardless of what the chat is set to. It absolutely engages the router model (orchestration layer) and is heavily steered toward fast completion, likely to ease server load and/or provide responsiveness. Something else to keep in mind: you won't get the same quality as using text and forcing thinking.
It’s not 5.1 non-thinking. It does not use GPT-5’s router.

It’s just regular 4o. The model is slightly different yes, but the underlying model is still 4o. No routing.

The main advantages are:
  1. You can send text/images to the model directly, and not only input via voice (i.e. its multimodality is now fully accessible)
  2. It’s slightly better at using some tools e.g. ask it “what is the weather in XXX city” or “show me the best bakeries in YYY” and it’ll return more fully featured widgets you can interact with since it’s no longer just the orb.

Even this new capability is easily a year or two behind some other things I've used.

4o advanced voice was launched only slightly more than a year ago….
 
  • Like
Reactions: novagamer
It’s not 5.1 non-thinking. It does not use GPT-5’s router.

It’s just regular 4o. The model is slightly different yes, but the underlying model is still 4o. No routing.

The main advantages are:
  1. You can send text/images to the model directly, and not only input via voice (i.e. its multimodality is now fully accessible)
  2. It’s slightly better at using some tools e.g. ask it “what is the weather in XXX city” or “show me the best bakeries in YYY” and it’ll return more fully featured widgets you can interact with since it’s no longer just the orb.



4o advanced voice was launched only slightly more than a year ago….
Other frontier companies expose their full latest AI model to voice so I do think they're a good year+ behind in that regard. Scale bites them in the ass.

I didn't know it was still 4o though, that's hilarious and sad. I'll edit my post, thanks for pointing it out.

Advanced voice was a downgrade from their prior voice model as far as interactions went, there is a lot of documented evidence about this and people were rightfully pissed when they forced that change to "advanced" months ago. The speed increase was good but the quality of the output was markedly worse.

Opinions will vary on this but that's my anecdotal experience and that of many others too. Take it with a grain of salt, YMMV.
 
I'm happy to talk to a computer. But this feels very much like talking to a bakery commercial.
Yeah, I wish these demos of AI for everyday use didn't nearly always default to "Where are the best trendy restaurants and coffee shops near me?" Can't they think of something a little less upscale, like "Where can I take my car to get an oil change without getting ripped off?"
 
Grok has already had this feature for months, actually.

I’ve noticed MacRumors rarely covers xAI or Grok, and when they do, the headlines often feel negative or dismissive, even though it’s consistently ranking at or near the top of the major benchmarks right now (LMSYS, Artificial Analysis, etc.).

At the same time, there’s still a lot of ChatGPT coverage even though Gemini (and others) have clearly pulled ahead in most objective tests lately, and OpenAI is reportedly burning massive cash. It just feels like the AI coverage here has a pretty strong editorial slant rather than being benchmark or driven by data. Kind of disappointing for a site that’s usually great about staying objective on the tech itself.
 
  • Like
Reactions: rsands1 and caliguy
Useful change. Don't usually use voice mode while I use ChatGPT but still good to have this option.
 
  • Like
Reactions: mganu
At the same time, there’s still a lot of ChatGPT coverage even though Gemini (and others) have clearly pulled ahead in most objective tests lately, and OpenAI is reportedly burning massive cash. It just feels like the AI coverage here has a pretty strong editorial slant rather than being benchmark or driven by data.

Under Settings > Apple Intelligence & Siri > Extensions, I currently see ChatGPT as the only option, so for now the preferential coverage it enjoys seems warranted, given its relevance to Apple users.

That said, as a paid subscriber to Gemini and Claude, I’d certainly welcome additional choices. However, if reports are accurate, it should only be a matter of months before the next Siri is powered by a tailored version of Gemini - which, I assume, would unlock it as an additional extension.
 
I'm going to ask every time this comes up: who the hell wants to talk to a computer?
I do. I want to talk to a computer. I'm not sure if you've ever used one of these LLMs but they are wildly useful. If you haven't you should give them a try. ChatGPT and Claude have quickly become part of my daily life and I've integrated Ai into many aspects of my business. It's been a game changer.
 
Grok has already had this feature for months, actually.
Grok also seems to be very easily manipulated by its owner who has unfettered access to it. Would rather use a company that hopefully has some checks and balances behind the scenes than one that obviously does not and it’s owner lets it be known if Grok responds unfavorably to his beliefs its a bug and will be fixed.
 
I do. I want to talk to a computer. I'm not sure if you've ever used one of these LLMs but they are wildly useful. If you haven't you should give them a try. ChatGPT and Claude have quickly become part of my daily life and I've integrated Ai into many aspects of my business. It's been a game changer.

I have. None of their utility comes from talking to the machine, in my experience. What exactly do you accomplish using the voice mode with these chat bots?
 
Grok also seems to be very easily manipulated by its owner who has unfettered access to it. Would rather use a company that hopefully has some checks and balances behind the scenes than one that obviously does not and it’s owner lets it be known if Grok responds unfavorably to his beliefs its a bug and will be fixed.
X is the company that championed Community Notes and made it the fact checking gold standard, which is now being copied and introduced at Meta and Google.

Regardless, I’ve seen Grok be critical of Elon. And it is amazing at coding/research.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.