To train their model. At the end of the day data is king, and if Apple has been atrocious with Siri and soon enough witb Apple Inteligence it’s because they don’t have the data.It seemed obvious to me this isn’t about creating a user experience but is for something OpenAI could use instead.
So I fired up a search for “how to create a voice clone” and the results discussing Retrieval-based Voice Conversion or the RVC model show that between 10-20 minutes of high quality audio source is needed to get the most accurate clone.
Sounds like OpenAI wants a huge data sample of your voices. For what? Who knows.
It may shock you, but some of us do.Wait, people still make phone calls?
Admittedly this is an easier way to access ChatGPT by voice than by using Apple Intelligence, because you can initiate a call hands-off using Siri (and you can do it on an iPhone that doesn’t support Apple Intelligence). One would have thought that Apple would provide an easier way.Wait, people still make phone calls?
The phone isn’t a high-quality audio source though.So I fired up a search for “how to create a voice clone” and the results discussing Retrieval-based Voice Conversion or the RVC model show that between 10-20 minutes of high quality audio source is needed to get the most accurate clone.
You can only train on voice audio that way if you also have a text transcript of what the voice is saying.To train their model.
I immediately thought of voice cloning when I read the title. I don't think they need that much time just to record the voiceprint.It seemed obvious to me this isn’t about creating a user experience but is for something OpenAI could use instead.
So I fired up a search for “how to create a voice clone” and the results discussing Retrieval-based Voice Conversion or the RVC model show that between 10-20 minutes of high quality audio source is needed to get the most accurate clone.
Sounds like OpenAI wants a huge data sample of your voices. For what? Who knows.