Microsoft Hails 'Historic Achievement' in Speech Recognition Technology

asleep · Oct 19, 2016

Seems like something Apple should have led the way on?

coolfactor · Oct 19, 2016

keysofanxiety said:
Properly exciting times.

I remember when I was but a sprog, wide-eyed in wonder, sitting on my Dad's lap as we watched Next Gen. I don't think anybody back then would have imagined technology to be as advanced as it is now.

I'm amazed at how forward-thinking the Star Trek series are. It's literally like looking into the future.

I'm watching the Voyager and Enterprise series again now on Netflix. Never get bored of them.

[doublepost=1476889293][/doublepost]

asleep said:
Seems like something Apple should have led the way on?

Hard to figure out Apple these days. They had very accurate speech recognition and speech synthesis (comparatively) back in the early 80s when the Mac first came out. Remember the Talking Moose?

keysofanxiety · Oct 19, 2016

coolfactor said:
I'm amazed at how forward-thinking the Star Trek series are. It's literally like looking into the future.

I'm watching the Voyager and Enterprise series again now on Netflix. Never get bored of them.

I love Voyager. Got so much better after 8472 and Seven of Nine jumped on board. Some utterly incredible episodes. Year of Hell... nuuurgh, so badass. But moreso exploring humanity, questions of morality; I absolutely love it.

Boy I've gotta nip this comment in the bud. I could go on about Star Trek all day.

SandboxGeneral · Oct 19, 2016

MacRumors said:
"We are moving away from a world where people must understand computers to a world in which computers must understand us,"

That's getting to core of what we've imagined computers could be I think. There is a very long road ahead still, but progress is being made everyday.

Analog Kid · Oct 19, 2016

It's amazing to me how long it has taken us to get computer voice recognition and synthesis to a workable level. We've been working on this for decades, and are just now barely getting passing marks. Image recognition, on the other hand, seems to be moving along much more quickly.

I would have thought that voice would have been much simpler to work with, it's a low rate band limited signal. Maybe that's the problem-- not enough redundancy in the data? It also has a time component which I suspect can be tricky for artificial neural nets.

I suspect that a lot of our ability to understand speech comes from context-- we probably do quite poorly on random words, but in conversation we use context for error correction (both for vocabulary and for accent).
[doublepost=1476890758][/doublepost]

asleep said:
Seems like something Apple should have led the way on?

For years the way has been led by airlines and other call center applications. That's where the money is. I think MS, Google and others picked this up out of a general interest in Deep Learning and even there the voice recognition is mostly icing on the cake-- what they're interested in is understanding the underlying sentence graph.

I really don't think speech is an efficient way of interacting with our machines except in very narrow applications. I don't want to sit in an office of people dictating, I don't want to be on a subway full of people browsing the web by voice. Every now and then I'll be with a group talking and someone feels the need to use Siri to fact check-- which basically shuts down the conversation. I'll only use voice when I'm alone, in the car for example-- and there it's fantastic.

So I think Apple probably left this to others largely because it doesn't have a huge impact on its customers. To the extent they do much with it, outside of Siri, it's always been an Assistive technology which Apple does seem to take more seriously than most.

Kajje · Oct 19, 2016

keysofanxiety said:
You should meet my ex-wife.

Your comment has been added to the MR-forum wall of fame.

pedzsan · Oct 19, 2016

I have what I would consider a very slight speech impediment. If you and I were to sit down and have coffee, within a few moments, you could understand 80% of what I said... Probably closer to 99%.

I've let to have a single word I have uttered properly received by any voice recognition equipment. Not a single word.

So I take all this with a dump truck load of salt.

dk001 · Oct 19, 2016

pedzsan said:
I have what I would consider a very slight speech impediment. If you and I were to sit down and have coffee, within a few moments, you could understand 80% of what I said... Probably closer to 99%.

I've let to have a single word I have uttered properly received by any voice recognition equipment. Not a single word.

So I take all this with a dump truck load of salt.

or

MrChurchyard · Oct 19, 2016

MacRumors said:
However it will be a long time before computers can understand the real meaning of what's being said, he cautioned.

It will be a long time before FELLOW HUMANS understand the real meaning of what’s being said.

kingpushup · Oct 19, 2016

drumcat said:
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

Measured approach to measuring

This field is notorious for limiting what they are measuring.

Note how the article says they fail on 'meaning.' This has remained consistent.

mdriftmeyer · Oct 19, 2016

Cuban Missles said:
The quote I see under Accuracy says --

Problem with productivity claims is they weigh them against someone who is inept at a computer using a keyboard. Being able to type consistently 65+ wpm on a computer for over 25 years and knowing UNIX operating systems for nearly 30 years I have never found these speech technology products to do anything but hinder my own workflow.

TMRJIJ · Oct 19, 2016

keysofanxiety said:
You should meet my ex-wife.

Bruh...

Wear you Thug Life glasses with pride!

Analog Kid · Oct 19, 2016

MrChurchyard said:
It will be a long time before FELLOW HUMANS understand the real meaning of what’s being said.

Ah, I see you have ventured into PRSI...

Stella · Oct 19, 2016

asleep said:
Seems like something Apple should have led the way on?

Why would you expect that? Microsoft do a lot more research than Apple, and not just for consumer electronics. And watch straps.

44267547 · Oct 19, 2016

drumcat said:
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

I'm actually shocked Dragon Naturally Speaking is still heavily used. It was really the only dictation used for years. I know Doctors and Transcriptionists heavily still use this today for patient reviews. Dragons Software is clunky And there dictation is average. Dragon suffered Minor grammatical errors, causing the user to spend more time editing. Dragon does have a decent speed rate at which it can transcribe.

Analog Kid · Oct 19, 2016

JeffyTheQuik said:
I'm still trying to figure out why Siri has to have everything sent to Apple for decoding.

I'm with you on this. I was hoping Siri for Mac would at least be able to eliminate the server requirement.

I think the reason for the server is because Siri is more of an expert system. It doesn't reason, it just responds to queries that follow a pattern that looks a bit like MadLibs. I think Apple is looking for recurring patterns that they don't have answers to so that they know what kinds of queries to expand to.

Basically I think it's more market research than technical requirement at this point.

doelcm82 · Oct 19, 2016

Analog Kid said:
It's amazing to me how long it has taken us to get computer voice recognition and synthesis to a workable level. We've been working on this for decades, and are just now barely getting passing marks. Image recognition, on the other hand, seems to be moving along much more quickly.

It took human beings hundreds of thousands of years to come up with written language, let alone to be able to transcribe speech into writing. Computers are outpacing us by a remarkable amount.

foobarbaz · Oct 19, 2016

drumcat said:
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

That's dictation, though. You automatically speak differently, i.e. more clearly, when dictating. Professional humans probably make 0 mistakes in that environment.

Microsoft claims conversational speech where even humans make many mistakes.

kwizatz · Oct 19, 2016

drumcat said:
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

I haven't used Dragon in years so this may not be true anymore, but when I did use it the first thing you had to do was "train" it on your voice by reading a bunch of stuff. So it may have been very accurate for me, but not very good at all for you. Sounds like this MS breakthrough is for general speech transcription.

Boosf · Oct 19, 2016

MacRumors said:
Researchers at Microsoft claim to have created a new speech recognition technology that transcribes conversational speech as well as a human does (via The Verge).

The system's word error rate is reportedly 5.9 percent, which is about equal to professional transcribers asked to work on the same recordings, according to Microsoft.

Microsoft researchers from the Speech & Dialog research group (Image: Allison Linn)

"We've reached human parity," said chief speech scientist Xuedong Huang in a statement, calling the milestone "an historic achievement".

To reach the milestone, the team used Microsoft's Computational Network Toolkit, a homegrown system for deep learning that the research team has made available on GitHub via an open source license. The system uses neural network technology that groups similar words together, which allows the models to generalize efficiently from word to word.

The neural networks draw on large amounts of data called training sets to teach the transcribing computers to recognize syntactical patterns in the sounds. Microsoft plans to use the technology in Cortana, its personal voice assistant in Windows and Xbox One, as well as in speech-to-text transcription software.

But the technology still has a long way to go before it can claim to master meaning (semantics) and contextual awareness - key characteristics of everyday language use that need to be grasped for Siri-like personal assistants to process requests and act upon them in a helpful way.

"We are moving away from a world where people must understand computers to a world in which computers must understand us," said Harry Shum, who heads the Microsoft AI Research group. However it will be a long time before computers can understand the real meaning of what's being said, he cautioned. "True artificial intelligence is still on the distant horizon."

Article Link: Microsoft Hails 'Historic Achievement' in Speech Recognition Technology

[doublepost=1476913848][/doublepost]That's fantastic! Now if they can do the same for Microsoft support staff that are based in.....other countries....wink, wink....

Hasukazu · Oct 19, 2016

Everlast66 said:
Great, so now Windows 10 is even better at harvesting your personal conversations and relaying them back home to Microsoft.

When should we expect the update to be pushed, with these Win 10 machines not giving the option to opt out of updates?

Yes, that's the main reason why they're doing this.
That's what you get for still using Windows

Windows users really are hardcore masochists. First Windows 8 with its insanity-inducing Metro tiles interface, then Windows 10, which is hardly better in that regard, and where the OS itself is a huge piece of spyware in many regards...

TXCherokee said:
....

....and you can stop reading here. As both an Apple and MS customer, I never believe a word MS says on future products until it hits the market. And then it is usually 1/2 as good with 1/3 of the features as the promises.

True, MS marketing fairy tale propaganda is a huge deal worse still than what Apple does.

Analog Kid said:
I really don't think speech is an efficient way of interacting with our machines except in very narrow applications. I don't want to sit in an office of people dictating, I don't want to be on a subway full of people browsing the web by voice.

Indeed. Speech assistants are only for people who are too lazy (or too dumb) to use other ways of interacting which are more appropriate and fitting. I've never used Siri even a single time, always have it deactivated, and seriously doubt I ever will. All it could be good for is for transcribing speech, and you really don't need an assistant for that.

I also fully agree that people using speech assistants in public places, public transportation, offices etc. are just a major nuisance to others. I would feel really annoyed by that.

Analog Kid · Oct 19, 2016

doelcm82 said:
It took human beings hundreds of thousands of years to come up with written language, let alone to be able to transcribe speech into writing. Computers are outpacing us by a remarkable amount.

I don't think the computers are doing this by themselves...

My point though was about the relative rate of development between speech recognition and other technologies. Image recognition is an obvious comparison point, but look at control systems for drones, autonomous vehicles and deep space probes. Some kid can build a Lego machine in his garage to solve the Rubik's cube in seconds. We can find ourselves anywhere on the planet to within a couple feet using satellites.

But that most basic form of communication is still hard to crack.

doelcm82 · Oct 19, 2016

Analog Kid said:
I don't think the computers are doing this by themselves...

My point though was about the relative rate of development between speech recognition and other technologies. Image recognition is an obvious comparison point, but look at control systems for drones, autonomous vehicles and deep space probes. Some kid can build a Lego machine in his garage to solve the Rubik's cube in seconds. We can find ourselves anywhere on the planet to within a couple feet using satellites.

But that most basic form of communication is still hard to crack.

Image recognition is pretty amazing. To see a computer correctly identify a dog in a photo is really cool. But look closer and you'll see that the computer recognizes other objects as dogs that clearly aren't dogs. I see photos of cats, horses, pigs, etc., where the computer calls it a dog. Like speech recognition, image recognition is still a work in process.

Analog Kid · Oct 19, 2016

doelcm82 said:
Image recognition is pretty amazing. To see a computer correctly identify a dog in a photo is really cool. But look closer and you'll see that the computer recognizes other objects as dogs that clearly aren't dogs. I see photos of cats, horses, pigs, etc., where the computer calls it a dog. Like speech recognition, image recognition is still a work in process.

Deep learning algorithms are now able to not only recognize dogs and flowers, but able to identify which breed of dog and species of flower:
https://www.microsoft.com/en-us/res...-algorithm-sets-imagenet-challenge-milestone/

and outperforming humans at the task.

JamesPDX · Oct 19, 2016

Courage!

Microsoft Hails 'Historic Achievement' in Speech Recognition Technology

macrumors 68040

macrumors 604

macrumors G3

Moderator emeritus

macrumors G3

macrumors 6502a

macrumors 6502

macrumors demi-god

macrumors member

macrumors regular

macrumors 68040

macrumors 68040

macrumors G3

macrumors G3

Cancelled

macrumors G3

macrumors 68040

macrumors 6502a

macrumors newbie

macrumors member

macrumors newbie

macrumors G3

macrumors 68040

macrumors G3

Suspended

Our Staff