Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
66,697
36,051


Apple is researching how to improve Siri to better understand people who talk with a stutter, according to new details shared by The Wall Street Journal in a piece on how companies train voice assistants to handle atypical speech.

ios14siriinterface.jpg

Apple has built a bank of 28,000 audio clips from podcasts featuring people who stutter, which could be used to train Siri. The data that Apple has collected will improve voice recognition systems for atypical speech patterns, according to an Apple spokesperson.

Along with improving how Siri understands people with atypical speech patterns, Apple has also implemented a Hold to Talk feature for Siri that allows users to control how long they want Siri to listen for. This prevents Siri from interrupting users with a stutter before they're finished speaking.

Siri can also be used without voice all together, through a Type to Siri feature that was first introduced in iOS 11.

Apple plans to outline its work to improve Siri in a research paper set to be published this week, which will provide more details on the company's efforts.

Google and Amazon are also working to train Google Assistant and Alexa to better understand all users, including those who have trouble using their voices. Google is collecting atypical speech data, and Amazon in December launched the Alexa Fund to let people who have speech impairments train an algorithm to recognize their unique vocal patterns.

Article Link: Apple Training Siri to Better Understand People With Atypical Speech
 
  • Like
Reactions: tobiastimpe
“Training” is a funny way to say programming.

The term "Artificial Intelligence" is not really an accurate term, but it used to broadly describe computer systems that learn and respond based on that learning.

Machine Learning involves feeding in "test" data with known and unknown outcomes. This takes several forms, but the general idea is that a database is built mapping inputs to outputs, or inputs to "intents". Say something to the computer and get a specific response. Ask Siri to set an alarm, and an alarm is set. This can not be programmed, per se, because there's a 100 different ways to ask Siri to set the alarm, and each user expects the same outcome. The training process normalizes those inputs to a single output or intention. How that intention executes is programmed, though.
 
  • Like
Reactions: EmotionalSnow
As a French speaker who does not come from France Siri is totally useless. Rarely understand, most of the time is not even close. I turned it on to English but not very good either (in this case because it's not my native language).

Hope it will improve some day.
 
Research? Really? is this not part of ongoing product improvement for the 1B+ user base?
Not that I think it's a bad thing, just calling it research ...
 
As a French speaker who does not come from France Siri is totally useless. Rarely understand, most of the time is not even close. I turned it on to English but not very good either (in this case because it's not my native language).

Hope it will improve some day.

As an English speaker who comes from England Siri is totally useless
 
  • Like
  • Disagree
Reactions: SMacDuff and Huck
Euphemisms are high these days. I stutter, and I don’t consider it “atypical”.

Why this fear of calling things with their real names? I won’t say I love to stutter, but I’m not ashamed of it either, and in fact I discovered lots of beautiful things in this world thanks to my stutter. If now they say I atypically speak rather than stutter... oh, what a world.

But it seems euphemisms will rule the world for decades to come.
 
As a stutterer, I can only imagine this will be an incredibly difficult problem to solve, and I believe that Microsoft is working on the same issue in the context of their Teams platform.

My stuttering is not as severe as it once was, but I have been with people who have a very difficult time talking, and to their credit, will literally take 30 seconds (if not minutes) to get a single word out. Think about that...that is a long pause. Some people tic. Some people pop. Sometimes, the throat closes. It's a terrible experience.

I don't think that Apple (or anyone) can really accommodate for the different challenges that people experience, but I applaud them for giving it a try.

If you're curious, I gave a talk to 2,000 people a coupe of years ago. Like I said, not as severe as it once was, but it's still there. Ugh, believe me. It's still there. 18NTC - Larry Glickman - Ignite Speaker on Vimeo
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.