Voice Control isn't 'The Future'

robeddie · Feb 25, 2016

Inspired by the discussion about the news of Siri coming to the mac, I decided to opine about what I think are the obvious problems with voice control. Many people talk about it being the 'future' of device input, a world without keyboards, etc. I just don't get it. Here's is my short list of fatal flaws to the theory that voice control will ever be the primary input method of choice.

1) You're in a church, restaurant, or other relatively quiet place where speaking out loud to your phone would be unacceptable, or rude.
2) You're in a loud environment (bar, concert hall, etc) where your speech input wouldn't be recognized
3) You're at home, talking to your phone, but all the sudden your 12-year old comes in and says something and screws up the voice input.
4) PRIVACY- You're around other people (other friends, coffee shop, etc) and actually want to communicate something, or look something up without everyone around you knowing your business.
5) The still limited ability of voice recognition to decipher more complicated instructions (which may be a long way off - if ever).

There's probably a bunch more, but just these 5 make voice-input a very limited use-case. It's fine for simple commands when you're in your car, the privacy of your home, that kind of thing.
But for people to suggest that voice control is going to mostly replace keyboard input? To me, that's just absurd, if only based on reason #4 above! Throughout the day I will text various things to people using my phone or computer keyboard. I work in a cubicle. Doing the same things with voice recognition? Forget it!!!!

Thoughts?

dennysanders · Feb 25, 2016

who is saying that voice control is the future?

robeddie · Feb 25, 2016

dennysanders said:
who is saying that voice control is the future?

No big public figure, but plenty of posters in that other thread about siri coming to the mac seem to have that impression.

Jessica Lares · Feb 25, 2016

To be fair, a lot of those situations are also ones where you shouldn't be using your phone either. And you're suggesting that the keyboard will totally go away when it won't. Not even the CEO of any company would like to be writing emails with their voice.

I use Siri a lot for stupid little things, and I like the idea of being able to do that without having to yell Hey Siri or have to carry my phone around to do the same thing. Stuff like finding the number for a local business and the duration of a movie currently in theaters, and the weather for the next 10 hours. I could do that in Spotlight and Safari too, but if I have the option of using Siri, I'd rather it than the alternatives because for ME it works faster because I don't have to think about typing it out.

Tech198 · Feb 27, 2016

Using voice is always and forever will be an issue...

I can never see siri replacing any button like the Home button iOS, if they make Touch ID part of the screen instead (pressure to unlock) etc... or other cases.. 3D Touch is one example, but could go much further.

Voice digitally will never be 100% accurate, different dialects, speech, pronunciation. etc.... and if you look at "Dragon Naturally speaking" u need to *teach* it....

Noticeable issues.

Benjamin Frost · Mar 14, 2016

robeddie said:
Inspired by the discussion about the news of Siri coming to the mac, I decided to opine about what I think are the obvious problems with voice control. Many people talk about it being the 'future' of device input, a world without keyboards, etc. I just don't get it. Here's is my short list of fatal flaws to the theory that voice control will ever be the primary input method of choice.

1) You're in a church, restaurant, or other relatively quiet place where speaking out loud to your phone would be unacceptable, or rude.
2) You're in a loud environment (bar, concert hall, etc) where your speech input wouldn't be recognized
3) You're at home, talking to your phone, but all the sudden your 12-year old comes in and says something and screws up the voice input.
4) PRIVACY- You're around other people (other friends, coffee shop, etc) and actually want to communicate something, or look something up without everyone around you knowing your business.
5) The still limited ability of voice recognition to decipher more complicated instructions (which may be a long way off - if ever).

There's probably a bunch more, but just these 5 make voice-input a very limited use-case. It's fine for simple commands when you're in your car, the privacy of your home, that kind of thing.
But for people to suggest that voice control is going to mostly replace keyboard input? To me, that's just absurd, if only based on reason #4 above! Throughout the day I will text various things to people using my phone or computer keyboard. I work in a cubicle. Doing the same things with voice recognition? Forget it!!!!

Thoughts?

I quite agree.

Siri and voice control are of no interest to 99% of the population. I think Apple keeps making a big thing of it simply because it feels it must, not because it's a good idea. Voice is cool, so it slavishly follows the fashion. Too bad.

My opinion is the same as yours: for disabled people, it has some benefit. For everyone else, it is a complete waste of time.

ApfelKuchen · Mar 14, 2016

Quite simply, the exceptions do not make the rule. Will there be circumstances where voice input is not appropriate? You also can't do everything with a keyboard, or everything with a mouse or track pad.

Most certainly. However, when we start considering the connected home, self-driving cars, and similar devices, voice input will be the rule, rather than the exception - most of us are in the relative privacy of our homes, vehicles, and offices when we do this. The more we use it, the more effective we will become using it, and the more ways we'll encounter it in our everyday lives.

As others have noted, if we're someplace it'd be rude to speak on the phone, then it'd be rude to dictate to a computer. If what you have to say is private, whether you're dictating to a computer or speaking over a phone... same difference. But under those circumstances, you may not want someone looking over your shoulder while you type, either. That's etiquette, not technology.

"Good morning, Dave"
"Good morning, HAL."
"Where would you like to go, Dave?"
"Take me to the grocery store"
"Is that the Kroger store you visited Wednesday, March 9?"
"Yes, HAL"

"What floor please?"
"22" "19" "15"
"This is the 18th floor. Next stop, 19th floor - Acme Motors Accounts Payable and Personnel, then 22nd floor, Law offices of Dewey, Cheetham, and Howe. We will stop at 15, the Acme Motors employee cafeteria, on the way down. Any other requests?"
"Stop at all floors"
"I'm sorry, young man, only one stop per passenger."

Speaking as a former audio engineer... many of the "impossible" conditions described come down to having the right microphone and having it in close proximity. If you can wear a Bluetooth headset for conversations on your iPhone, you have a voice control input. If your car can handle intelligible hands-free conversations, it can do voice control. If you office environment doesn't allow you to speak loudly enough to be picked up by an iMac-mounted mic, wear a headset and speak quietly. If you can speak on the phone at your desk, you can speak to your computer. If voice input is shown to aid productivity, smart employers will provide headsets or improve the acoustic conditions in the office to facilitate that improvement (speech input vs. 10 wpm hunt-and-peck...? No contest!)

I have a friend who constantly texts from her iPhone, and never keyboards - it's all voice input. Sure, there are occasional "typos" (voice-os?), but on the whole, quite accurate. If I'm out for my daily walk, I respond to texts with dictation, either via the mic on my Watch, or from my iPhone. At my desk, it's keyboard all the way - but I've been touch-typing for about 45 years.

I've been teaching my 81-year-old aunt to use her iPhone. She never learned to type, so why not learn to use voice input from the start? No issue "training" either her iPhone or mine - for the most part, if we break into conversation while dictating text, whichever iPhone we're using accurately transcribes what both of us have said.

People are ready for the simplicity of speech input. It may have just been movie dialog, but generations know that someday, we'll talk to our robots and computers. "Someday" is today, for those willing to try. As we build confidence

It's happening now, and we're still on the early edge of the adoption curve. I suspect within another five years nobody will be saying, "It'll never work."

Tech198 · Mar 14, 2016

all points are good, however it's only the future *if* you can turn it off as well...

not selective, but in any devices..... but at this time u can only turn voice off where it exists.

For example in the Amazon Echo, the only way to turn off the mic is to unplug it..

I can see Siri being the future if it understands you 100% of the time in all cases.. which means u would need noise cancellation in every device to do that.

smacrumon · Mar 24, 2016

Supplementary system, that's all I gotta say, Siri.

Sill · Mar 24, 2016

ApfelKuchen said:
It's happening now, and we're still on the early edge of the adoption curve. I suspect within another five years nobody will be saying, "It'll never work."

I don't think people are really emphatic about it never working. In my opinion its more of a question, "Why do it all?" Your examples were pretty interesting, and if any of them were to sway me it would be the one about responding to texts via dictation, but otherwise it was still "why?"

The elevator one gave me a laugh though. Why would I respond to a question about which floor to go to when there were perfectly good buttons in front of me to push to indicate my floor preference? (Never mind that it would really stink to have to yell "close! CLOSE DAMMIT!" instead of pushing a close button when you saw someone coming that you didn't want to ride with 😀)

I've read numerous discussions online about how this is all technology for the sake of having technology there. Why develop all this voice recognition stuff to do the same thing that a very simple button could do?

At the very foundation of the discussion is privacy. If the device is listening all the time, who else is listening? Thats my major objection to Amazon Echo. And when you start researching the people who are behind Jeff Bezos the purpose of that thing really gets pretty sketchy.

ApfelKuchen · Mar 25, 2016

Sill said:
I don't think people are really emphatic about it never working. In my opinion its more of a question, "Why do it all?" Your examples were pretty interesting, and if any of them were to sway me it would be the one about responding to texts via dictation, but otherwise it was still "why?"

The elevator one gave me a laugh though. Why would I respond to a question about which floor to go to when there were perfectly good buttons in front of me to push to indicate my floor preference? (Never mind that it would really stink to have to yell "close! CLOSE DAMMIT!" instead of pushing a close button when you saw someone coming that you didn't want to ride with 😀)

I've read numerous discussions online about how this is all technology for the sake of having technology there. Why develop all this voice recognition stuff to do the same thing that a very simple button could do?

At the very foundation of the discussion is privacy. If the device is listening all the time, who else is listening? Thats my major objection to Amazon Echo. And when you start researching the people who are behind Jeff Bezos the purpose of that thing really gets pretty sketchy.

To, "Why do it at all," I'd ask, "Why stop here?"

What is it about voice interface (other than its inaccuracy) that stirs such passion? Is it that it's a faux-human? Was Asimov right, all those years ago - that humanity would revolt against the existence of humanoid robots, no matter how benevolent? Xenophobia to the 'n'?

I don't think it's technology for the sake of technology. It's technology intended to make technology invisible; to eliminate the need to master physical controls and return interaction to human terms - a spoken request, a spoken acknowledgement. Do we communicate with the machine on its terms, or does the machine communicate with us on our terms? Sure, it's an illusion, but do we have to have "It's a machine" rubbed in our noses? Why can't we imagine we're bosses, instead of machine operators?

The chauffeur, the personal secretary, the elevator operator... utter luxuries that most of us could never afford, positions eliminated by technology, or both. Yet technology hasn't set us free. We are our own chauffeurs, secretaries, and elevator operators. I am the entire household staff of Downton Abbey, in servitude to myself!

"Pushbuttons? We don't need no stinking pushbuttons!" What's so great, in an empty elevator, about juggling arms full of grocery bags while we try to reach the "14" button? In a crowded elevator, isn't it rude to reach across someone's chest to punch a button? Voice control gracefully addresses both issues.

You assume elevators must have pushbuttons, therefore voice control is redundant. All I can say is, stuff changes. Physical switch panels cost money to produce and maintain. They're often vandalized. In modern elevators, they connect to the same computerized control systems a voice control system would. Meantime, mics, video cameras, and loudspeakers are easier to hide/protect from vandals... in short, there are economic benefits to ditching switch panels. Building owners and manufacturers will lobby to amend whatever regulations may stand in the way of "progress."

What's so special about physical pushbuttons? Take if from someone who had to maintain and repair pushbutton-based control systems... they'd better not be the apex of technology. And if your eyes have to scan parallel rows of buttons to find the desired floor, and your mind has to then guide your arm to the desired button? You've already worked harder than necessary. Think "14." Speak "14." Until we have telepathy, it won't get any simpler.

Certainly, 'Not everything that can be done, should be done.' John Woram, longtime tech journalist, called that Foobini's Law. I heard it from him at an Audio Engineering Society convention in around 1974. I've found Foobini nearly as useful as Murphy's Law when applied to design and engineering. Voice control, however, passes the test, as far as I'm concerned.

As to privacy? Certainly, a world full of video and audio sensors can become the ultimate Orwellian nightmare. I have no easy answers. Technology can be abused, if we allow it to be abused. Our rights can be trampled, if we allow them to be trampled. The right answer to, "Who is listening," has to be, "Nobody." The servant is there to do our bidding, the servant must be sworn to secrecy.

The principles are easy, but we need a hacker society to make sure the principles are upheld. Asimov's Three Laws of Robotics have a beautiful elegance and eloquence. His robots' "Positronic" brains would not function in the absence of those laws, and could not function in violation of those laws. https://en.wikipedia.org/wiki/Three_Laws_of_Robotics. If only such a thing was possible.

However, whether Siri-equipped Apple TVs or Amazon Echo... While these will make audio/visual sensor networks more pervasive, those same, dystopian scenarios can be applied to the phones in our pockets, our home security systems, the mics and cameras in our computers... We trust our home networks' firewall to resist outside incursion. If we're worried that the bad guys will be listening in, then we have to be worried that they're accessing our HDDs, too. The best defense we have, for now, may be communications and server bandwidth - streaming the output of dozens of mics and cameras, from every home on the connected Earth, is simply out of the question. Most processing is done locally - calls across the network are relatively few and far between.

In the end, it is a matter of trust. It's conceivable that Apple's position on privacy and encryption has greater commercial implications for Apple's future than "ecosystem."

Sill · Mar 26, 2016

ApfelKuchen said:
To, "Why do it at all," I'd ask, "Why stop here?"

What is it about voice interface (other than its inaccuracy) that stirs such passion? Is it that it's a faux-human? Was Asimov right, all those years ago - that humanity would revolt against the existence of humanoid robots, no matter how benevolent? Xenophobia to the 'n'?

I'd say you're reading far too much in to our objections here. If we truly had self-contained, beneficent AI that worked solely for the owner of said technology, I would be in favor of it. Unfortunately, everything is now directed at "the cloud". Everything said goes off site, and all processing happens off site. Everything said is indefinitely saved. And when you move to the cloud, someone or something is always watching. Generally its algorithms, but depending on how those algorithms grade your input, you may end up with the attention of one or more humans. Still, this wouldn't bother me so bad if it was simply companies doing this for their own demographic information. The great danger here is government, namely ours. If we trust any company with our information, the government lays claim to that information in the name of... its perceived need to access that information. For that, I can't stomach any voice activated AI because by its very nature it must always be on and listening, therefore the capability exists for anyone - especially the government - to be listening.

ApfelKuchen said:
I don't think it's technology for the sake of technology. It's technology intended to make technology invisible; to eliminate the need to master physical controls and return interaction to human terms - a spoken request, a spoken acknowledgement. Do we communicate with the machine on its terms, or does the machine communicate with us on our terms? Sure, it's an illusion, but do we have to have "It's a machine" rubbed in our noses? Why can't we imagine we're bosses, instead of machine operators?

In order for that technology to be truly invisible the level of monitoring would have to be truly ubiquitous. There would have to be countless cameras and microphones used. In order for that monitoring to work, many levels of biometric authentication would need to happen. Think the Spielberg version of "Minority Report", with the constant retinal scans going on in every place. And once that authentication starts, the collar gets tighter.

ApfelKuchen said:
The chauffeur, the personal secretary, the elevator operator... utter luxuries that most of us could never afford, positions eliminated by technology, or both. Yet technology hasn't set us free. We are our own chauffeurs, secretaries, and elevator operators. I am the entire household staff of Downton Abbey, in servitude to myself!

Theres a great irony there; technology develops to spare people the banality of our former agrarian existence, and afford the population the benefits of the division of labor and the economy of scale. Yet people end up locking themselves into a work week that ends up being just as banal - if not more - just so they can make the money to pay for the technology that spares them the agrarian existence. A double portion of irony then that people have to pay to join gyms to work out, since our lives are far more sedentary than just 50 years ago, and also pay to seek counseling or self help programs to help them deal with how little control they have over their lives.

ApfelKuchen said:
"Pushbuttons? We don't need no stinking pushbuttons!" What's so great, in an empty elevator, about juggling arms full of grocery bags while we try to reach the "14" button? In a crowded elevator, isn't it rude to reach across someone's chest to punch a button? Voice control gracefully addresses both issues.

I did it every week for the twenty years I owned a condominium. If the elevator is empty, an elbow could catch a button. If the elevator was occupied then it would be more rude for someone to not ask which floor you wanted and press the button for you. Forget the voice control - I think a better use of technology would be to find a way to help pass the time in the elevator other than force you to occupy a tiny space with people you may not like.

ApfelKuchen said:
You assume elevators must have pushbuttons, therefore voice control is redundant. All I can say is, stuff changes. Physical switch panels cost money to produce and maintain. They're often vandalized. In modern elevators, they connect to the same computerized control systems a voice control system would. Meantime, mics, video cameras, and loudspeakers are easier to hide/protect from vandals... in short, there are economic benefits to ditching switch panels. Building owners and manufacturers will lobby to amend whatever regulations may stand in the way of "progress."

Since we are talking about condo/apartment life, wouldn't it be the pinnacle of progress for each resident to have their own fob that gives them access to the elevator control? No voice control necessary, just a tap on their watch or phone widget, whatever? That would render unwanted persons unable to use the elevator, or even access it.

ApfelKuchen said:
What's so special about physical pushbuttons? Take if from someone who had to maintain and repair pushbutton-based control systems... they'd better not be the apex of technology. And if your eyes have to scan parallel rows of buttons to find the desired floor, and your mind has to then guide your arm to the desired button? You've already worked harder than necessary. Think "14." Speak "14." Until we have telepathy, it won't get any simpler.

If a person has a problem scanning those buttons and picking out the one they want, I'd say they have bigger problems. Fifty floors listed in two or three parallel columns is not a big difficulty for a typical person to scan and select a single number, even in a non-familiar setting. As a matter of fact, in that non-familiar setting a couple columns of digits can actually be a small comfort for that person as such an arrangement is something that any person can identify and use nearly immediately.
As for the person who is familiar with that particular location and elevator, the reason the buttons are special is simple: muscle memory. Neuromuscular efficiency. The person can usually select their floor and push the button while their attention is diverted, like maybe they're on a phone call or talking to someone on the way into the elevator. What is to prevent that same conversation from interfering with a voice-driven elevator control?

ApfelKuchen said:
As to privacy? Certainly, a world full of video and audio sensors can become the ultimate Orwellian nightmare. I have no easy answers. Technology can be abused, if we allow it to be abused. Our rights can be trampled, if we allow them to be trampled. The right answer to, "Who is listening," has to be, "Nobody." The servant is there to do our bidding, the servant must be sworn to secrecy.

Which goes back to what I said above regarding the need for all of this recognition technology to be local to the user and controlled singly by that person (and their assignees). But how to be sure?

ApfelKuchen said:
The principles are easy, but we need a hacker society to make sure the principles are upheld. Asimov's Three Laws of Robotics have a beautiful elegance and eloquence. His robots' "Positronic" brains would not function in the absence of those laws, and could not function in violation of those laws. https://en.wikipedia.org/wiki/Three_Laws_of_Robotics. If only such a thing was possible.

Its not possible.
If conscious or conscious-seeming animatronic robots were to become common some government agency would claim jurisdiction; in fact I think they would create an entirely new one just to "deal with the problem". Any technology built would be capable of having a back door or some kind of exploit built into it by whomever manufactured it, or by whomever licensed it (see FCC), whomever regulated shipment and transfer of the products (ICC, FTC, interference by the NSA). Even if no such exploit existed, the information would be claimed by the government in the name of national security. In the extremely unlikely event that my requirements listed in my first paragraph were taken seriously - that this was locally stored and locally processed info, what is to prevent a new class of social-engineer hacker from rising up that would make it their mission to attack those robots using psychology? How do we know that your personal AI wouldn't be able to be deceived into giving up your data? Far fetched, I'll be the first to admit, but its just a shade more far-fetched than the conscious-seeming animatronics that I think aren't too far in the future now.

You seem to be a very well-read and experienced person. If I may make a recommendation that I think directly addresses what we're talking about and offers some very interesting solutions....
For several years now I have recommended to people that they read two books from Daniel Suarez: Daemon, and Freedom. While they are works of fiction they are also a gripping tale of how a society could develop in parallel to what we already have, and that society would offer liberty and incredible resources to those who participate. There is an AI framework shown in that two book story that may be a solution.

Search

Search

Voice Control isn't 'The Future'

robeddie

Suspended

dennysanders

macrumors 6502

robeddie

Suspended

Jessica Lares

macrumors G3

Tech198

Cancelled

Benjamin Frost

Suspended

ApfelKuchen

macrumors 601

Tech198

Cancelled

smacrumon

macrumors 68030

Sill

macrumors 6502a

ApfelKuchen

macrumors 601

Sill

macrumors 6502a

Our Staff