Microsoft Hails 'Historic Achievement' in Speech Recognition Technology

MacRumors · Oct 19, 2016

Researchers at Microsoft claim to have created a new speech recognition technology that transcribes conversational speech as well as a human does (via The Verge).

The system's word error rate is reportedly 5.9 percent, which is about equal to professional transcribers asked to work on the same recordings, according to Microsoft.

Microsoft researchers from the Speech & Dialog research group (Image: Allison Linn)

"We've reached human parity," said chief speech scientist Xuedong Huang in a statement, calling the milestone "an historic achievement".

To reach the milestone, the team used Microsoft's Computational Network Toolkit, a homegrown system for deep learning that the research team has made available on GitHub via an open source license. The system uses neural network technology that groups similar words together, which allows the models to generalize efficiently from word to word.

The neural networks draw on large amounts of data called training sets to teach the transcribing computers to recognize syntactical patterns in the sounds. Microsoft plans to use the technology in Cortana, its personal voice assistant in Windows and Xbox One, as well as in speech-to-text transcription software.

But the technology still has a long way to go before it can claim to master meaning (semantics) and contextual awareness - key characteristics of everyday language use that need to be grasped for Siri-like personal assistants to process requests and act upon them in a helpful way.

"We are moving away from a world where people must understand computers to a world in which computers must understand us," said Harry Shum, who heads the Microsoft AI Research group. However it will be a long time before computers can understand the real meaning of what's being said, he cautioned. "True artificial intelligence is still on the distant horizon."

Article Link: Microsoft Hails 'Historic Achievement' in Speech Recognition Technology

fitshaced · Oct 19, 2016

'And we were all like omg, and the machine was like 'I know right?' So then we lolled.'

drumcat · Oct 19, 2016

Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

keysofanxiety · Oct 19, 2016

Properly exciting times.

I remember when I was but a sprog, wide-eyed in wonder, sitting on my Dad's lap as we watched Next Gen. I don't think anybody back then would have imagined technology to be as advanced as it is now.

MH01 · Oct 19, 2016

This a great news.

I'm so impressed how far Voice assistants have come in the last few years.

Everlast66 · Oct 19, 2016

Great, so now Windows 10 is even better at harvesting your personal conversations and relaying them back home to Microsoft.

When should we expect the update to be pushed, with these Win 10 machines not giving the option to opt out of updates?

keysofanxiety · Oct 19, 2016

Everlast66 said:
When should we expect the update to be pushed, with these Win 10 machines not giving the option to opt out of updates?

Have you enabled the Defer feature updates checkbox (Start> Settings> Update & Security > Advanced options)? It won't prevent security updates but does the trick for most things.

Pakaku · Oct 19, 2016

No matter how far speech recognition goes, I'm still going to feel awkward as hell talking to an inanimate object.

keysofanxiety · Oct 19, 2016

Pakaku said:
I'm still going to feel awkward as hell talking to an inanimate object.

You should meet my ex-wife.

2457282 · Oct 19, 2016

drumcat said:
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

According to dragon's website they are 99% accurate. Looks like MS failed again

miknos · Oct 19, 2016

Cuban Missles said:
According to dragon's website they are 99% accurate. Looks like MS failed again

Actually is "up to" 99%. It can be as low as "zero".

fullauto · Oct 19, 2016

fitshaced said:
'And we were all like omg, and the machine was like 'I know right?' So then we lolled.'

Classic!

JeffyTheQuik · Oct 19, 2016

GOOD!

Now, install them at every singe drive through in the USA.
[doublepost=1476878498][/doublepost]

Everlast66 said:
Great, so now Windows 10 is even better at harvesting your personal conversations and relaying them back home to Microsoft.

When should we expect the update to be pushed, with these Win 10 machines not giving the option to opt out of updates?

I'm still trying to figure out why Siri has to have everything sent to Apple for decoding. MS Voice Command could do that 10 years ago with pretty good accuracy on a 256MB Windows Phone.
The methodology was:
Speak...
If it was on the phone, like contacts, it would just get the info for you, or call that person (if commanded to).
If it needed more help, it would go out to the web and get it for you.

Nothing sent to Microsoft, as it would work even if there wasn't Internet connectivity.,

JosephAW · Oct 19, 2016

Yeah but MS will still phone home to prism with your info.

Defthand · Oct 19, 2016

Meanwhile, Apple's voicemail transcriptions are embarrassing. I told them during the beta testing that they should withhold the feature until it can be improved. If people actually depended on it, it would be worse than the infamous Maps fiasco.

TXCherokee · Oct 19, 2016

MacRumors said:
Researchers at Microsoft claim to have created

....

....and you can stop reading here. As both an Apple and MS customer, I never believe a word MS says on future products until it hits the market. And then it is usually 1/2 as good with 1/3 of the features as the promises.

Kaibelf · Oct 19, 2016

This is great! Now Skynet will be more approachable.

Benjamin Frost · Oct 19, 2016

I don't believe a word of this.

Human transcribers must vary widely in ability; secondly, there are many provisos given, which effectively means that it is still much lower accuracy than a human.

2010mini · Oct 19, 2016

keysofanxiety said:
You should meet my ex-wife.

You owe me a new keyboard sir. This comment made me do a spit take all over it.

2457282 · Oct 19, 2016

miknos said:
Actually is "up to" 99%. It can be as low as "zero".

The quote I see under Accuracy says --

Accuracy
Control your computer by voice with speed and accuracy
Dragon speech recognition software is better than ever. Talk and your words appear on the screen. Say commands and your computer obeys. Dragon is 3x faster than typing and it's 99% accurate. Master Dragon right out of the box and start experiencing big productivity gains immediately.

CreatorCode · Oct 19, 2016

Cuban Missles said:
The quote I see under Accuracy says --

[...]

DNS is very accurate if you speak clearly and directly, period. Contrary to what its name implies, comma, you cannot just speak naturally, period. You have to dictate specifically to it, period.

New paragraph.

The Microsoft experiment, comma, allegedly, comma, transcribes ordinary recorded speech and dialog without any additional effort on the part of the speaker, period.

Benjamin Frost · Oct 19, 2016

CreatorCode said:
DNS is very accurate if you speak clearly and directly, period. Contrary to what its name implies, comma, you cannot just speak naturally, period. You have to dictate specifically to it, period.

New paragraph.

The Microsoft experiment, comma, allegedly, comma, transcribes ordinary recorded speech and dialog without any additional effort on the part of the speaker, period.

It sounds impressive, but I very much doubt that it would be accurate at transcribing punctuation. Semi-colons, colons? These are tricky things for humans to get right, let alone a computer. Or: ... and dashes/hyphens and /. Also brackets. Exclamation marks? That last sentence would be extremely hard for a computer; if I didn't raise my voice at the end of the sentence to denote a question, how would it know to insert a question mark?

dk001 · Oct 19, 2016

Benjamin Frost said:
I don't believe a word of this.

Human transcribers must vary widely in ability; secondly, there are many provisos given, which effectively means that it is still much lower accuracy than a human.

I work with a regional (USA) team - coast to coast, border to border, including territories. There are very frequent times, due in part to grammar, accent, and colloquialisms, that understanding is at a minimum.
Would love to know from where that percentage arose.

Then add in other languages and dialects.

djlythium · Oct 19, 2016

keysofanxiety said:
Properly exciting times.

I remember when I was but a sprog, wide-eyed in wonder, sitting on my Dad's lap as we watched Next Gen. I don't think anybody back then would have imagined technology to be as advanced as it is now.

One guy might've...

Rest well, Mr. Jobs.

merkinmuffley · Oct 19, 2016

If this report is correct, this is a big step forward in voice recognition. I'm curious how much processing is behind it.

Microsoft Hails 'Historic Achievement' in Speech Recognition Technology

macrumors bot

macrumors 68000

macrumors 65816

macrumors G3

Suspended

macrumors newbie

macrumors G3

macrumors 68040

macrumors G3

Suspended

Suspended

macrumors 6502a

macrumors 68020

macrumors 603

macrumors 65816

macrumors 6502

Suspended

Suspended

macrumors 601

Suspended

macrumors regular

Suspended

macrumors demi-god

macrumors 65816

macrumors 6502a

Our Staff