Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

coolfactor

macrumors 604
Jul 29, 2002
7,043
9,705
Vancouver, BC
Properly exciting times.

I remember when I was but a sprog, wide-eyed in wonder, sitting on my Dad's lap as we watched Next Gen. I don't think anybody back then would have imagined technology to be as advanced as it is now.

I'm amazed at how forward-thinking the Star Trek series are. It's literally like looking into the future.

I'm watching the Voyager and Enterprise series again now on Netflix. Never get bored of them. :)
[doublepost=1476889293][/doublepost]
Seems like something Apple should have led the way on?

Hard to figure out Apple these days. They had very accurate speech recognition and speech synthesis (comparatively) back in the early 80s when the Mac first came out. Remember the Talking Moose?

 

keysofanxiety

macrumors G3
Nov 23, 2011
9,539
25,302
I'm amazed at how forward-thinking the Star Trek series are. It's literally like looking into the future.

I'm watching the Voyager and Enterprise series again now on Netflix. Never get bored of them. :)

I love Voyager. Got so much better after 8472 and Seven of Nine jumped on board. Some utterly incredible episodes. Year of Hell... nuuurgh, so badass. But moreso exploring humanity, questions of morality; I absolutely love it.

Boy I've gotta nip this comment in the bud. I could go on about Star Trek all day. :oops::oops:
 

Analog Kid

macrumors G3
Mar 4, 2003
8,860
11,382
It's amazing to me how long it has taken us to get computer voice recognition and synthesis to a workable level. We've been working on this for decades, and are just now barely getting passing marks. Image recognition, on the other hand, seems to be moving along much more quickly.

I would have thought that voice would have been much simpler to work with, it's a low rate band limited signal. Maybe that's the problem-- not enough redundancy in the data? It also has a time component which I suspect can be tricky for artificial neural nets.

I suspect that a lot of our ability to understand speech comes from context-- we probably do quite poorly on random words, but in conversation we use context for error correction (both for vocabulary and for accent).
[doublepost=1476890758][/doublepost]
Seems like something Apple should have led the way on?
For years the way has been led by airlines and other call center applications. That's where the money is. I think MS, Google and others picked this up out of a general interest in Deep Learning and even there the voice recognition is mostly icing on the cake-- what they're interested in is understanding the underlying sentence graph.

I really don't think speech is an efficient way of interacting with our machines except in very narrow applications. I don't want to sit in an office of people dictating, I don't want to be on a subway full of people browsing the web by voice. Every now and then I'll be with a group talking and someone feels the need to use Siri to fact check-- which basically shuts down the conversation. I'll only use voice when I'm alone, in the car for example-- and there it's fantastic.

So I think Apple probably left this to others largely because it doesn't have a huge impact on its customers. To the extent they do much with it, outside of Siri, it's always been an Assistive technology which Apple does seem to take more seriously than most.
 
Last edited:

pedzsan

macrumors 6502
May 22, 2016
276
111
Leander, TX
I have what I would consider a very slight speech impediment. If you and I were to sit down and have coffee, within a few moments, you could understand 80% of what I said... Probably closer to 99%.

I've let to have a single word I have uttered properly received by any voice recognition equipment. Not a single word.

So I take all this with a dump truck load of salt.
 

dk001

macrumors demi-god
Oct 3, 2014
10,575
14,912
Sage, Lightning, and Mountains
I have what I would consider a very slight speech impediment. If you and I were to sit down and have coffee, within a few moments, you could understand 80% of what I said... Probably closer to 99%.

I've let to have a single word I have uttered properly received by any voice recognition equipment. Not a single word.

So I take all this with a dump truck load of salt.

tonka-classic-steel-mighty-dump-truck-13018810-01.jpg


or

ffd50c9f1745359bb5b8ee39969e6070.jpg
 

kingpushup

macrumors regular
Jun 24, 2013
222
234
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

Measured approach to measuring ;) This field is notorious for limiting what they are measuring.

Note how the article says they fail on 'meaning.' This has remained consistent.
 
  • Like
Reactions: Benjamin Frost

mdriftmeyer

macrumors 68040
Feb 2, 2004
3,809
1,985
Pacific Northwest
The quote I see under Accuracy says --


Problem with productivity claims is they weigh them against someone who is inept at a computer using a keyboard. Being able to type consistently 65+ wpm on a computer for over 25 years and knowing UNIX operating systems for nearly 30 years I have never found these speech technology products to do anything but hinder my own workflow.
 
  • Like
Reactions: Benjamin Frost

44267547

Cancelled
Jul 12, 2016
37,642
42,491
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

I'm actually shocked Dragon Naturally Speaking is still heavily used. It was really the only dictation used for years. I know Doctors and Transcriptionists heavily still use this today for patient reviews. Dragons Software is clunky And there dictation is average. Dragon suffered Minor grammatical errors, causing the user to spend more time editing. Dragon does have a decent speed rate at which it can transcribe.
 
Last edited:

Analog Kid

macrumors G3
Mar 4, 2003
8,860
11,382
I'm still trying to figure out why Siri has to have everything sent to Apple for decoding.
I'm with you on this. I was hoping Siri for Mac would at least be able to eliminate the server requirement.

I think the reason for the server is because Siri is more of an expert system. It doesn't reason, it just responds to queries that follow a pattern that looks a bit like MadLibs. I think Apple is looking for recurring patterns that they don't have answers to so that they know what kinds of queries to expand to.

Basically I think it's more market research than technical requirement at this point.
 

doelcm82

macrumors 68040
Feb 11, 2012
3,748
2,768
Florida, USA
It's amazing to me how long it has taken us to get computer voice recognition and synthesis to a workable level. We've been working on this for decades, and are just now barely getting passing marks. Image recognition, on the other hand, seems to be moving along much more quickly.
It took human beings hundreds of thousands of years to come up with written language, let alone to be able to transcribe speech into writing. Computers are outpacing us by a remarkable amount.
 

foobarbaz

macrumors 6502a
Nov 29, 2007
873
1,953
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

That's dictation, though. You automatically speak differently, i.e. more clearly, when dictating. Professional humans probably make 0 mistakes in that environment.

Microsoft claims conversational speech where even humans make many mistakes.
 
  • Like
Reactions: rjohnstone

kwizatz

macrumors newbie
May 17, 2013
9
0
Apparently they have good microphones...?

Besides, wasn't Dragon Naturally at 1-2% ten years ago?

I haven't used Dragon in years so this may not be true anymore, but when I did use it the first thing you had to do was "train" it on your voice by reading a bunch of stuff. So it may have been very accurate for me, but not very good at all for you. Sounds like this MS breakthrough is for general speech transcription.
 

Boosf

macrumors member
Jun 10, 2011
87
122
Seattle, WA



Researchers at Microsoft claim to have created a new speech recognition technology that transcribes conversational speech as well as a human does (via The Verge).

The system's word error rate is reportedly 5.9 percent, which is about equal to professional transcribers asked to work on the same recordings, according to Microsoft.

06-research-team-20161013-lowres-800x533.jpg

Microsoft researchers from the Speech & Dialog research group (Image: Allison Linn)

"We've reached human parity," said chief speech scientist Xuedong Huang in a statement, calling the milestone "an historic achievement".

To reach the milestone, the team used Microsoft's Computational Network Toolkit, a homegrown system for deep learning that the research team has made available on GitHub via an open source license. The system uses neural network technology that groups similar words together, which allows the models to generalize efficiently from word to word.

The neural networks draw on large amounts of data called training sets to teach the transcribing computers to recognize syntactical patterns in the sounds. Microsoft plans to use the technology in Cortana, its personal voice assistant in Windows and Xbox One, as well as in speech-to-text transcription software.

But the technology still has a long way to go before it can claim to master meaning (semantics) and contextual awareness - key characteristics of everyday language use that need to be grasped for Siri-like personal assistants to process requests and act upon them in a helpful way.

"We are moving away from a world where people must understand computers to a world in which computers must understand us," said Harry Shum, who heads the Microsoft AI Research group. However it will be a long time before computers can understand the real meaning of what's being said, he cautioned. "True artificial intelligence is still on the distant horizon."

Article Link: Microsoft Hails 'Historic Achievement' in Speech Recognition Technology
[doublepost=1476913848][/doublepost]That's fantastic! Now if they can do the same for Microsoft support staff that are based in.....other countries....wink, wink....
 

Hasukazu

macrumors newbie
Sep 28, 2016
11
4
Germany
Great, so now Windows 10 is even better at harvesting your personal conversations and relaying them back home to Microsoft.

When should we expect the update to be pushed, with these Win 10 machines not giving the option to opt out of updates?

Yes, that's the main reason why they're doing this.
That's what you get for still using Windows :p
Windows users really are hardcore masochists. First Windows 8 with its insanity-inducing Metro tiles interface, then Windows 10, which is hardly better in that regard, and where the OS itself is a huge piece of spyware in many regards...

....

....and you can stop reading here. As both an Apple and MS customer, I never believe a word MS says on future products until it hits the market. And then it is usually 1/2 as good with 1/3 of the features as the promises.

True, MS marketing fairy tale propaganda is a huge deal worse still than what Apple does.

I really don't think speech is an efficient way of interacting with our machines except in very narrow applications. I don't want to sit in an office of people dictating, I don't want to be on a subway full of people browsing the web by voice.

Indeed. Speech assistants are only for people who are too lazy (or too dumb) to use other ways of interacting which are more appropriate and fitting. I've never used Siri even a single time, always have it deactivated, and seriously doubt I ever will. All it could be good for is for transcribing speech, and you really don't need an assistant for that.

I also fully agree that people using speech assistants in public places, public transportation, offices etc. are just a major nuisance to others. I would feel really annoyed by that.
 

Analog Kid

macrumors G3
Mar 4, 2003
8,860
11,382
It took human beings hundreds of thousands of years to come up with written language, let alone to be able to transcribe speech into writing. Computers are outpacing us by a remarkable amount.
I don't think the computers are doing this by themselves...

My point though was about the relative rate of development between speech recognition and other technologies. Image recognition is an obvious comparison point, but look at control systems for drones, autonomous vehicles and deep space probes. Some kid can build a Lego machine in his garage to solve the Rubik's cube in seconds. We can find ourselves anywhere on the planet to within a couple feet using satellites.

But that most basic form of communication is still hard to crack.
 

doelcm82

macrumors 68040
Feb 11, 2012
3,748
2,768
Florida, USA
I don't think the computers are doing this by themselves...

My point though was about the relative rate of development between speech recognition and other technologies. Image recognition is an obvious comparison point, but look at control systems for drones, autonomous vehicles and deep space probes. Some kid can build a Lego machine in his garage to solve the Rubik's cube in seconds. We can find ourselves anywhere on the planet to within a couple feet using satellites.

But that most basic form of communication is still hard to crack.
Image recognition is pretty amazing. To see a computer correctly identify a dog in a photo is really cool. But look closer and you'll see that the computer recognizes other objects as dogs that clearly aren't dogs. I see photos of cats, horses, pigs, etc., where the computer calls it a dog. Like speech recognition, image recognition is still a work in process.
 

Analog Kid

macrumors G3
Mar 4, 2003
8,860
11,382
Image recognition is pretty amazing. To see a computer correctly identify a dog in a photo is really cool. But look closer and you'll see that the computer recognizes other objects as dogs that clearly aren't dogs. I see photos of cats, horses, pigs, etc., where the computer calls it a dog. Like speech recognition, image recognition is still a work in process.
Deep learning algorithms are now able to not only recognize dogs and flowers, but able to identify which breed of dog and species of flower:
https://www.microsoft.com/en-us/res...-algorithm-sets-imagenet-challenge-milestone/

and outperforming humans at the task.
 
  • Like
Reactions: JamesPDX
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.