Apple Building its Own Speech Recognition Team to Power Siri

jonnysods · Jul 1, 2014

I wish it didn't need the internet for everything. Like adding reminders or shuffling songs.

gadgetguy03 · Jul 1, 2014

JGowan said:
I can't really say much about Siri considering SPRINT truly sucks in my area and I typically can't get "her" to help me much. That said, when I'm in big cities, she understands everything I ask, dictates like a champ and gets me the info I need. I think the service has taken a bad rap actually, probably from a lot of people in spotty areas.

The only time I have an issue with dictation is when I'm using my cheap Walmart bluetooth headset. That said, the onus is on me at that point for not investing in quality accessories.

Every other time when I'm not using BT she responds accurately and promptly.

savar · Jul 1, 2014

HiRez said:
Ugh...seems like a bit of a late scramble, not good. Did Samsung pull the rug out from under them and they weren't ready for it? Not the type of thing you can throw together at the last minute, I hope this isn't going to be the Maps debacle all over again.

They started hiring for this team in 2011. It's not a kneejerk reaction to Samsung. "Journalists" are capitalizing on the Samsung news to generate new "news" about Apple's struggles.

Karma*Police · Jul 1, 2014

Oletros said:
Google voice has offline support already

It's limited which makes it even more useless as you try to figure out what works and what doesn't. I also feel like voice recognition doesn't work as well offline.

Apple's tradeoff in this case makes more sense IMO. Either you have wireless access and Siri works or you don't and Siri doesn't work. Simple.

larry t · Jul 1, 2014

VulchR said:
Nuance would have been a better acquisition than Beats. Voice recognition is not just about having smart people think things up. There will be patents and proprietary algorithms involved that are owned by Nuance, so simply poaching ex-Nuance employees will be useless. Additionally, Nuance probably has the largest database of spoken language that they can use to train their AI networks to recognise speech (no doubt each time you send off a verbal request to Siri, it gets downloaded to Nuance's database to improve their recognition networks).

Simply reinventing speech recognition in-house will probably work out as poorly as Apple Maps. Apple will try, they'll struggle, and then they'll buy second-rate help from other companies. If Samsung get Nuance, Apple will have really fumbled the ball. Don't believe me? Then why is Apple using Nuance now given it has been working on speech recognition (and synthesis) for years?

They will do what Google did, by creating there own. There off to a good start. They now own a company that's done great things with speech recognition (Novaris) and the guy that created Dragon dictation, Dr. James Baker. Google did something similar when they acquired Nuance's co-founder Mike Cohen.

Oletros · Jul 1, 2014

Karma*Police said:
It's limited which makes it even more useless as you try to figure out what works and what doesn't. I also feel like voice recognition doesn't work as well offline.

Apple's tradeoff in this case makes more sense IMO. Either you have wireless access and Siri works or you don't and Siri doesn't work. Simple.

Yes, it is better not having voice recognition (the thing Google has offline) than having it.

nuckinfutz · Jul 1, 2014

goobot said:
Why not just buy Nuance rather than build a team that would take years to get to the same level of quality we are at now?

Because Nuance would be an billion-dollar acquisition

----------

TheFluffyDuck said:
Lets check Apples recent in-house service success rate for a second here.

MobileMe Sync =Fail!
Apple Maps = Fail!
iCloud = average
iMessage = average
iAds = Fail!
Photo book printing = SUCCESS!

Voice recognition is going to turn out great!

Without you mentioning who is "stellar" in this categories this post basically amounts to one sided opinion.

----------

steve23094 said:
TBH I can see this going the way of Apple Maps.

You mean Apple Maps today or Apple Maps at launch? Because Apple Maps today is far better (at least in the US) than it was at launch. And with recent acquisitions it should get even better.

haravikk · Jul 1, 2014

Mac OS always used to have pretty decent speech recognition, it's a shame it just didn't get developed any further except to add a couple of newer voices a while ago, so it's good to see Apple returning to it (even if it's because of the imminent threat of losing Siri).

I hope this also means more of the voice recognition taking place on the device itself, with only queries that need to be outsourced being sent elsewhere, could also make for easier app integration capabilities.

So yeah, it could actually be great news

Spinnetti · Jul 1, 2014

Why is the tech between siri and voice to text different?

Seems to me Siri is actually performing WORSE than when they first brought it out. Siri "understands me" about 10% of the time first try, yet using the mic and speech to text is right about 80% of the time. Seems to me they ought to use the code they have already in the speech to text and ditch siri...

I guess I'm ecohing haravikk's same sentiment

nuckinfutz · Jul 1, 2014

haravikk said:
Mac OS always used to have pretty decent speech recognition, it's a shame it just didn't get developed any further except to add a couple of newer voices a while ago, so it's good to see Apple returning to it (even if it's because of the imminent threat of losing Siri).

I hope this also means more of the voice recognition taking place on the device itself, with only queries that need to be outsourced being sent elsewhere, could also make for easier app integration capabilities.

So yeah, it could actually be great news

It is.

Apple's acquisition of Novauris should have telegraphed to most people that

A. Nuance is going to be replaced
B. Apple has big plans for the Semantic Web

The less licensing Apple has to pay the more likelihood we see voice technology integrated into OS X, iOS and wearable platforms.

hansonjohn590 · Jul 1, 2014

Dear Apple,

Thats's cute.

Signed,
Google and Microsoft.

kdarling · Jul 1, 2014

jonnysods said:
I wish it didn't need the internet for everything. Like adding reminders or shuffling songs.

Could be coming.

Many of those in Apple's Boston voice R&D group, came from a company called VoiceSignal Technologies, which had created a standalone recognizer for commands like that.

jonnysods · Jul 1, 2014

kdarling said:
Could be coming.

Many of those in Apple's Boston voice R&D group, came from a company called VoiceSignal Technologies, which had created a standalone recognizer for commands like that.

I hope so. It needs a lot of work right now in iOS8 because it's buggy as heck, but really hoping for some offline commands. I will say that Androids voice recognition is as fast as anything, i hope that comes to Siri. Too much thinking and sending back and forth before she responds. Still very thankful that I have it though!

Col4bin · Jul 1, 2014

Better late than never, I guess?

What's more surprising is that Siri has had no major improvements since the iPhone 4s was released. I mean, Google Now completely destroys Siri, and I dislike Android. But it's a fact.

gerbilbox · Jul 1, 2014

For those who didn't read the Wired article, MacRumors neglected to mention a major development. The speech recognition team that Apple is building are well-known people from the neural network research community, and neural network-based speech recognition recently gave Google a 25% boost in accuracy. That's a huge improvement. Microsoft also demonstrated big improvements using this technology.

The article points out that Apple is the only major speech recognition player that hasn't released a neural network-based product yet, but suggests that Siri will soon adopt this technology and get a much needed boost in accuracy. I'm personally hoping that this will make it into iOS 8 come the fall, but Apple has been known to take their time.

Gasu E. · Jul 1, 2014

randian said:
I'd be surprised if Nuance's contract with Apple allows a buyer to cancel it.

Contracts between big companies and little suppliers are invariably written with all sorts of clauses protecting the big guy's interests in the event of a takeover.

The Phazer · Jul 1, 2014

nuckinfutz said:
You mean Apple Maps today or Apple Maps at launch? Because Apple Maps today is far better (at least in the US) than it was at launch. And with recent acquisitions it should get even better.

You mean, considerably worse than it's competitor, just less considerably worse than it was, but still with a completely unassailable gap and over two years of unhappy customers?

Or where it's a complete catastrophe in the rest of the world, where the majority of Apple's revenue comes from?

This is a terrible idea. It combines all the worst elements of Apple trying to build a huge database on the cheap that it did with Maps with entering an area where the company they're trying to break off from owning huge amounts of important patents that are going to significantly damage development.

Seriously, this is a stupid, stupid idea.

VulchR · Jul 1, 2014

gerbilbox said:
For those who didn't read the Wired article, MacRumors neglected to mention a major development. The speech recognition team that Apple is building are well-known people from the neural network research community, and neural network-based speech recognition recently gave Google a 25% boost in accuracy. That's a huge improvement. Microsoft also demonstrated big improvements using this technology.

The article points out that Apple is the only major speech recognition player that hasn't released a neural network-based product yet, but suggests that Siri will soon adopt this technology and get a much needed boost in accuracy. I'm personally hoping that this will make it into iOS 8 come the fall, but Apple has been known to take their time.

Huh? How can speech recognition be achieved without neural nets? Frankly I cannot believe that there is a commercial voice recognition product that is not based on neural nets (either locally on a desktop or on a server). I knew some of the people who started the ball rolling with neural nets and speech recognition. The poor quality & extremly high variability of the signal pretty much mathematically precludes any other approach.

Are you talking about neural nets that adapt to the individual user?

autrefois said:
You have to consider how much space [local storage of voice recognition decoder] that would take up locally. It's obviously much more important for them to include more required apps many people don't want, like Podcasts.

The only solution for a mobile phone would be to use a chip with dedicated memory rather than software that uses RAM. This could be done provided the speech recognition neural net is fixed (by training with a huge dataset that covers human variation in speech), but I doubt this would for a network that would learn the user's speech patterns.

gerbilbox · Jul 1, 2014

VulchR said:
Huh? How can speech recognition be achieved without neural nets? Frankly I cannot believe that there is a commercial voice recognition product that is not based on neural nets (either locally on a desktop or on a server). I knew some of the people who started the ball rolling with neural nets and speech recognition. The poor quality & extremly high variability of the signal pretty much mathematically precludes any other approach.

Are you talking about neural nets that adapt to the individual user?

I claim no expertise in the field, which is why I specifically cited the Wired article, so you should direct your questions to Robert McMillan (the article's author) or the comments section in the Wired piece. I could have misquoted the article, so please refer to it. Whatever technology that's employed, some of the recent improvements sound impressive, which was reason why I mentioned it.

Jvanleuvan · Jul 1, 2014

How I understand:

It's my understanding that Siri runs completely on APPLE's servers, Apple (through an acquisition) is responsible for the logic, and they use Nuance's voice recognition technology for the speech-to-text conversion.

The reason it's in the cloud is that the "advanced" speech-to-text is CPU intensive and not suitable for use on handsets

so the workflow, as I understand it is

voice spoken to handset-->record to Sound file --> Send to cloud--> Nuance software Decodes into text --> Apple's "siri" technology interprets the text commands --> Sends machine actions(i.e call someone, send a text) back to handset

The weakest part of this chain, in my opinion, is the Nuance part. When siri "hears" me right it does the right actions 99% of the time. The problem I have is that it sometimes doesn't "hear" me right

I also find that the Speech-to-text is best with "dictionary words"; proper names it gets bungled on. I tried to search for a restaurant named "Fresh Pixx" and of course it hears "fresh picks" which it cannot find. (and another hundred examples)

It appears that Siri uses the standard Nuance dictionary for the entire speech file and then tries to interpret the text that it spits out. But, again the "standard" dictionary doesn't include many "proper" names, especially those that are "close" to standard words/names

Siri should keep a "sound pattern" record of all my contacts and of course have a "sound pattern" of all yellow page results indexed via GPS coordinates, as well as city/state names/ street names etc.. also indexed by GPS.

These sound dictionaries should have multiple "foreign" pronunciations, i.e French, Spanish, other common ones (as things/people in the U.S. frequently have Spanish/french/etc... names) i.e. a restaurant named "Foie Gras", should be in the dictionary as "Foy Grass" and "Fwa Gra"

WHAT should be happening is that it should use other clues in the text string to use these special speech-to-text dictionaries.

I.e. I say "call Fresh Pixx in Dublin" It should get That I'm calling someone/something and "in Dublin" is a geographical area (probably in California as that where my GPS says I'm at)

1) it should look-up "Dublin" in the "city" dictionary around my Geographic area

2) because I specified a Geographic location, it's likely a business, so "fresh Pixx" should be only compared to the dictionary of businesses in/around Dublin CA

If I just say "call Fresh Pixx" Then everything after "call" is a "special Look-up"
because I DIDN'T specify a location, it should first Check my contacts, if not found there it should check businesses in/around my current area (say 10 miles radius or so)

In my experience, even really poor speech-to-text can be surprisingly accurate if you have good restricted dictionaries.

StyxMaker · Jul 1, 2014

randian said:
You don't have to buy all of a business. You can just buy the speech recognition patents and engineers and let Nuance keep the rest.

Only if Nuance is willing to sell just the speech recognition part of their business.

nuckinfutz · Jul 1, 2014

The Phazer said:
You mean, considerably worse than it's competitor, just less considerably worse than it was, but still with a completely unassailable gap and over two years of unhappy customers?

Or where it's a complete catastrophe in the rest of the world, where the majority of Apple's revenue comes from?

This is a terrible idea. It combines all the worst elements of Apple trying to build a huge database on the cheap that it did with Maps with entering an area where the company they're trying to break off from owning huge amounts of important patents that are going to significantly damage development.

Seriously, this is a stupid, stupid idea.

Everyone here keeps telling people that the Maps are horrible without providing a shred of "recent" information backing claims up. Apple Maps debuted a long time ago. Since then there has been plenty of improvement.

Let us not forget that before Apple delivered their own Mapping solution we iOS users were at a deficit wrt the native Mapping app on iOS versus Googles Mapping app on Android. Google didn't seem to find the time to deliver vector maps, Night Time mode and more.

The reality is Apple Maps is improving and closing the gap

Nothing is standing still and Developers are taking notice.

With acquisitions like Hot Stop, Embark and Broadmap it appears that Mapping technologies are a topic of interest for Apple. Guess that kind of dovetails their backend improvements that are being done.

If you're still pushing the "Apple Maps is debacle" meme I seriously suggest taking a look a "current" events and leaving the past where it belongs. In the past.

supermarino · Jul 1, 2014

I've said it before and I'll say it again. Nuance is a major player in the document lifecycle business owning several companies such as eCopy, Equitrac, and Copitrak. This is a major business that Samsung has been testing products in for the last several years. If Samsung is buying Nuance it may have a lot more to do with some of Nuance's expertise in those fields and less to do with Speech Recognition, not to say they wouldn't take advantage of that. I can see why Apple doesn't have as much interest in Nuance when all they really would want is that Speech Recognition piece - especially when they already have invested in their own group for it.

SpectatorHere · Jul 2, 2014

nuckinfutz said:
Everyone here keeps telling people that the Maps are horrible without providing a shred of "recent" information backing claims up. Apple Maps debuted a long time ago. Since then there has been plenty of improvement.

Let us not forget that before Apple delivered their own Mapping solution we iOS users were at a deficit wrt the native Mapping app on iOS versus Googles Mapping app on Android. Google didn't seem to find the time to deliver vector maps, Night Time mode and more.

The reality is Apple Maps is improving and closing the gap

Nothing is standing still and Developers are taking notice.

With acquisitions like Hot Stop, Embark and Broadmap it appears that Mapping technologies are a topic of interest for Apple. Guess that kind of dovetails their backend improvements that are being done.

If you're still pushing the "Apple Maps is debacle" meme I seriously suggest taking a look a "current" events and leaving the past where it belongs. In the past.

The Apple Maps app is still not in the same league as Google's. I avoid it at all costs, but still get stuck using it as so many apps push you to use Apple's Maps. When I do have to use Apple Maps still regularly makes mistakes in major US metro areas--sometimes imaginary streets that don't exist, incorrect addresses, etc.

I would highly suspect that Google's Maps drop is simply attributable to iOS defaulting to their app rather than Google's. With superior route options, better traffic info, more up to date maps, street view, and such, I just can't let you get away trying to equate the two.

I will allow that Apple Maps might be somewhat better/improved. It doesn't appear to have gotten worse at the least.

SpectatorHere · Jul 2, 2014

VulchR said:
Huh? How can speech recognition be achieved without neural nets? Frankly I cannot believe that there is a commercial voice recognition product that is not based on neural nets (either locally on a desktop or on a server). I knew some of the people who started the ball rolling with neural nets and speech recognition. The poor quality & extremly high variability of the signal pretty much mathematically precludes any other approach.

Are you talking about neural nets that adapt to the individual user?

The only solution for a mobile phone would be to use a chip with dedicated memory rather than software that uses RAM. This could be done provided the speech recognition neural net is fixed (by training with a huge dataset that covers human variation in speech), but I doubt this would for a network that would learn the user's speech patterns.

I too am suspicious of any claim of not using neural networking models previously.

The Wired article suggests that a new technique using neural nets was developed by a Canadian researcher who has gone on to help MS and Google dramatically improve their existing tech. Meanwhile, Nuance/Apple apparently have kept with their in-house approach and up to this point not adopted the newer methodology. The article does seem to imply Siri's existing make-up is not neural network-based, but I think it's just the author not being as clear as they should be.

Since Google can do some translating offline, I wonder if hardware neural nets are what's being hinted at. Given the history of desktop dictation requiring as much processing power as possible, and Apple's reliance on cloud-based computing for Siri, I wouldn't think offline would work very well absent dedicated silicon.

Apple Building its Own Speech Recognition Team to Power Siri

macrumors G3

macrumors regular

macrumors 68000

macrumors 68030

macrumors member

macrumors 603

macrumors 603

macrumors 65832

macrumors regular

macrumors 603

macrumors 6502

Contributor

macrumors G3

macrumors 68000

macrumors regular

macrumors 603

macrumors 68040

macrumors 68040

macrumors regular

macrumors regular

Suspended

macrumors 603

macrumors newbie

macrumors 6502a

macrumors 6502a

Our Staff