TV Industry Preparing for Voice Recognition Interfaces in 2012

subsonix · Dec 9, 2011

A little demo of a voice controlled TV.

http://video.aol.com/video/a-new-invention-on-and03930-rockand039-020311-tv-replay/3666431115

spazzcat · Dec 9, 2011

srxtr said:
Why do people keep thinking this?

Obviously they're gonna add a trigger where the TV volume will go mute when Siri is on.

I really don't want a hardware trigger, but see no easy way around not having one.

With my xbox, I just say xbox and it goes in to listen mode. But if you were watching live TV this would be an issue if a xbox ad was showing...

----------

Piggie said:
This is going to be amazing.

Now I have to find the remote, press a button to change the channel or make the volume louder.

In the future, I might be able to find the remote, press a button to call up Siri and then shout, "Siri, change to channel 1" or "Siri increase volume"

Wow, the future is going to be so awesome.

As long as it has good mic, no shouting should be needed...

foodog · Dec 9, 2011

Renzatic said:
Ha! I guess the old saying "what's old is new again" is more than just a pithy cliche. Didn't some random company release a voice controlled television set back in the late 90's? It didn't catch on then, obviously. I wonder what makes people think it'll catch on now?

Apple release the Newton decades ago and it didn't catch on.... that is what the iPhone and iPad grew out of. Sometimes ideas are ahead of their time.

Chrisg2014 · Dec 9, 2011

I don't know

I don't even know if I would by a apple TV... I mean, I don't want apple to own my home. I mean I mainly only by apple products, I don't go for windows, except for Xbox 360. But it would be nice if i could play my music and videos on my TV with out having to worry about compatibility but then. There is the "Apple TV" and I like the remote, I don't want to have to talk. Ya, I might lose the remote and have to go and find it, but I also might lose my voice then I am really screwed. Back to my main topic, I don't want apple to own my home, there not very open source, they don't like foran things to come in to the picture they only want to you to buy from them. Like with the iPhone you can only use iOS, some people might want to use Android... (I don't know why but you don't get a choice with a $300 phone or if you buy off contract $800.

E.Lizardo · Dec 9, 2011

Exotic-Car Man said:
The only problem with voice recognition on TVs is that it could interfere when commanding volume. If the TV is too loud, then the mic may pick that up. "Turn volume down to ten." "Did you say, 'buy a Snuggie?'"

easily overcome with technology similar to noise cancelling headphones.

HobeSoundDarryl · Dec 9, 2011

MisterMe said:
Oh, brother. Not only are televisions not alone as the sole electronic device in the room, they are not alone with a single viewer. Dad does not want Kevin to yell "SpongeBob!" while he and his buddies are watching the wide receiver head toward the end zone. If Apple returns to the TV set business, then we can expect Siri to play a role. However, it will not be the end all and be all that so many fans seem to think it will be.

First, I'm in the camp to think that a Siri (microphone) implementation is going to be in a very simple remote rather than the TV or next :apple:

TV. Thus, "Kevin" won't be able to flip it to Sponge Bob anymore than he can in the no-Siri world of TV remotes today. The gatekeeper is the person holding the remote.

Second, whether I'm right or wrong about "first" there are a ton of ways to deal with THAT problem. For example, after tuning to that game "Siri Lock until 4pm" to basically put Siri to sleep from any actions other than "Siri unlock" (and maybe "Siri mute") and then some form of override password to return Siri to normal functionality.

I do agree that Siri is not THE feature. Else, it appears that Apple may be last to that particular party. Personally, in spite of a rumor about an Apple Television about every 18 seconds

, I'm still skeptical about Apple getting into that particular business. Margins are too thin. Apple choosing a "perfect" screen size for all won't work as well as it has for other Apple products (but aren't we looking forward to the evangelists offering elaborate justifications how the sizes other than what Apple chooses for us make no sense for anyone... "99% of people don't need sizes other than THE size our Lord Apple has chosen for us" and maybe even someone will make a oft-referenced chart that implies that sizes other than what Apple has chosen make no sense for what our eyes can see, etc). Plasma vs. LCD vs. LED vs. something else? Ports? Locked down like an :apple:

TV or much more open (to all other video tech hookups) like all HDTVs available now? Etc.

And, most importantly, unlike the "revolutions" in computers, music players, phones, tablets, if an approx. $100 :apple:

TV is also available to bring the software advantages of an Apple television to anyone else's TV, I just don't see the hardware being able to be enough to woo more than the most dedicated fans. More simply, Apple's successes are all built upon a locked-down merger of hardware & software. In this particular case, Apple endorses the software being separated out from the hardware in the form of an :apple:

TV box. If OS X could run on any computer (in an Apple endorsed way), would we all be as quick to buy Apple computers? If iOS could run on any smart phone, would we all be as quick to buy only Apple phones?

If an :apple:

TV brings the same software and unique hardware advantages to any television brand, it means Apple is endorsing a way for us to choose any screen size, any type of television (plasma vs. LCD vs. LED, 3D or not 3D, 1080p or 720p, etc), etc- including the HDTV we already have- and still get the exact same (software) experience. Odds are high that Apple won't make their television panel. So the exact same picture would probably be available in an LG or Samsung, etc-branded television for probably about 30%-50% less. Spend the $100 or so for an :apple:

TV to incorporate the Apple software goods and save 20%-40+% for the EXACT same television screen. Or get the size & type of TV you really want and hook up the little box for the EXACT same Apple software experience.

If Siri is the big "I cracked it" benefit, that is mostly software. It will work in a next gen :apple:

TV (it would work in the current gen :apple:

TV with a software update and a new microphone remote). As many people have pointed out in many other Apple television threads, any big breakthrough feature that is software-based can be done in an approx. $100 set-top box. Apple doesn't have to build a whole television to roll out breakthrough software benefits.

I'm in the camp that a true "I cracked it" breakthrough involves some kind of massive change in how the content is packaged for us consumers. Maybe it's some form of bypass-the-middleman (cable, satt) players though I caution that those very same middlemen usually own the pipes through which an Apple replacement solution would have to flow (so that ain't happening).

It's not the oft-stated "al a carte" as it's imagined, meaning "I pay $100 for 200 channels but only watch 10; so I want to subscribe to my 10 for 50 cents each". The business model that pays all the people to create the programs on our favorite 10 or so channels doesn't work if the average revenue per subscriber drops from $100 to around $5. Instead, the al-a-carte fantasy would probably result in many individual- but more popular channels- costing $5 to $10+ each to maintain the average revenue per subscriber (which, in part, would keep the creative infrastructure- the many people who actually create the shows- in place). Cut that revenue by upwards of 95% en masse and you better be ready for the replacement model to be what we get on youtube and similar where the no revenue/low revenue model for creating the shows actually works good enough to motivate show creation.

The "I cracked it" breakthrough probably requires some kind of new model of distributing content that links Apple iCloud directly to iUsers without data flowing through Comcast/Time Warner/Cablevision/et all pipelines (and this will need to solve the local original content and live sports issues as well). It can't be a 2G/3G/4G wireless solution for both cost and finite wireless bandwidth issues. It would have to be either entirely new transmission technology or news of Apple buying out probably DISH network (though that's only a U.S.-based solution). Look for rumors about Apple being able to link us directly to iCloud and then we'll have something. Until then, there is no cable-killing solution for the masses if those masses must retain a subscription to broadband from the same cable company (price of bandwidth would rise to make up for the defection of cable customers).

subsonix · Dec 9, 2011

E.Lizardo said:
easily overcome with technology similar to noise cancelling headphones.

Not really, noise canceling headphones works because one end is clearly separated from the outside noise. The outside noise is sampled and played back in the headphones with the phase reversed. If the noise sampling would pick up not only the noise but also the sound from speaker elements it would not work.

divinox · Dec 9, 2011

PlipPlop said:
Kinect calibrates itself for background and television noise , I imagine Apples copy will do the same thing.

What blatant copying on Apples behalf that would be! Cant they innovate on their own?

----------

captmatt said:
Microsoft is NOT "already there" with a Siri like voice interface. Yes, they expanded the XBox Kinect interface. No, it's nothing like Siri.

For example, in Netflix, with Kinect voice interface, I found that it is typical Microsoft. It starts working and I think, "This is kind of cool!" Then, it doesn't go far enough.

My next thought is "I'm doing it wrong". Then I realize: My tool is broke!

In Netflix voice will only get you so far. I could move to new screens of videos, but to select an individual video, I had to move my hands like a cursor. But wait! To move my hands like a cursor, I had to stand up! That violates the Prime Directive of the couch potato! The other option was to grab the remote which I could've done in the first place.

Looking forward to Apple getting it right! Microsoft should've waited and copied something good rather than giving us poor execution of a rumored capability.

I had the exact same experience trying to get Siri to open an app; "I'm holding it wrong" i silently thought to myself.

----------

subsonix said:
Not really, noise canceling headphones works because one end is clearly separated from the outside noise. The outside noise is sampled and played back in the headphones with the phase reversed. If the noise sampling would pick up not only the noise but also the sound from speaker elements it would not work.

theory is that the device knows its own output, hence it can effectively filter it out.

----------

foodog said:
Apple release the Newton decades ago and it didn't catch on.... that is what the iPhone and iPad grew out of. Sometimes ideas are ahead of their time.

Yup. (Ignoring that you're talking about implementations, rather than ideas per se. Ideas per se are pretty much always "ahead of its time").

HobeSoundDarryl · Dec 9, 2011

subsonix said:
Not really, noise canceling headphones works because one end is clearly separated from the outside noise. The outside noise is sampled and played back in the headphones with the phase reversed. If the noise sampling would pick up not only the noise but also the sound from speaker elements it would not work.

But aren't noise canceling algorithms forced to be somewhat generic because the noise they are trying to cancel is unpredictable? In other words, my noise canceling headphones can't know the noise I'm about the experience so they have to function with relatively generic models of trying to recognize noise and then cancel it out.

In this case, the sound to play through those speakers will be processed by the television or box. The algorithms would know precisely what audio is to be ignored so that Siri could listen for sounds NOT in the audio stream. In short, for Siri's "ears," that processing would mute the sound of the programming so that it is listening for only different sounds. These different sounds would be the sounds of the viewer(s).

When I'm at a crowded party, my biological computer (brain) and less-efficient listening devices (ears) are able to focus in on what 1 or a few people are saying while mostly screening out the "noise" of everyone else. The idea is the same except the technology would be able to be much more precise about the noise to ignore since it is processing the exact details of that audio itself.

Whether right or wrong though, I still think a Siri implementation is a microphone in a remote tuned to "hear" a very localized command rather than the audio playing in the background.

Bubba Satori · Dec 9, 2011

lfc said:
I like to call this, "The Apple Effect".

I like to call this " The Kool-aid Fueled Reality Distortion Field." :apple:

*LTD* · Dec 9, 2011

Bubba Satori said:
I like to call this " The Kool-aid Fueled Reality Distortion Field."

That has engulfed the entire industry.

Apple sneezes, everyone reaches for a kleenex.

subsonix · Dec 9, 2011

HobeSoundDarryl said:
But aren't noise canceling algorithms forced to be somewhat generic because the noise they are trying to cancel is unpredictable? In other words, my noise canceling headphones can't know the noise I'm about the experience so they have to function with relatively generic models of trying to recognize noise and then cancel it out.

They can record the noise and therefor know what noise you are about to experience. It's not 100% effective though.

HobeSoundDarryl said:
In this case, the sound to play through those speakers will be processed by the television or box. The algorithms would know precisely what audio is to be ignored so that Siri could listen for sounds NOT in the audio stream. In short, for Siri's "ears," that processing would mute the sound of the programming so that it is listening for only different sounds. These different sounds would be the sounds of the viewer(s).

Yeah, that seems like a possible solution actually. Subtract a phase inverted copy of the audio stream going to the speakers from the microphone input. It would probably not cancel out everything but likely reduce it quite significantly.

HobeSoundDarryl said:
When I'm at a crowded party, my biological computer (brain) and less-efficient listening devices (ears) are able to focus in on what 1 or a few people are saying while mostly screening out the "noise" of everyone else. The idea is the same except the technology would be able to be much more precise about the noise to ignore since it is processing the exact details of that audio itself.

I don't think the idea is the same really, noise cancelation works with, and exploits physical properties of sound. While the brain filters sound based on perception and selection.

Gravity · Dec 9, 2011

My hope for an Apple TV

If I were in charge of the design for an Apple TV... I'd work on an edge-to-edge screen. No bezel! And just to be different, instead of an all-black face when the power is down... make it all-white.

dBeats · Dec 9, 2011

For those struggling to come up with how you trigger a voice activated TV with no controls. Hello, iPhone Remote App

AidenShaw · Dec 9, 2011

HobeSoundDarryl said:
Relative to all this "Siri can't hear over the TV volume" nonsense, why not just program the system to ignore the audio stream for the television programming so that Siri can't hear it at all (like noise canceling headphones, but much better since the "noise" would be very specific and precise)? Then, Siri would just hear the sounds in the room that are NOT playing through the TV.

This would be easy if the TV has one mono speaker.

In a surround sound system, where the TV may not even be fed the audio it gets much harder - masking 8 channels of sound with various delays between the speakers is more complicated.

mzelicskovics · Dec 9, 2011

Voice Recognition? Really?

I think everyone is WAY off with what the Apple TV might be. Voice recognition is the great feature? HA!!! I have the iPhone 4S and use Siri. Siri's dictation is a great feature for when you're driving. It's also great for making appointments when you are at home. But how many people use Siri in everyday places? Do I want my coworkers to know that I am making a doctor appointment or that I have an interview? The issue with voice recognition is the voice part. You have to make noise and talk to your phone. So at home, instead of just changing the channel, I'll have to talk to my remote? What if my wife or baby are sleeping? And besides that, Siri is no where near perfect! So there is NO WAY that Apple would use it as the main feature on an Apple TV. Integrating a TV into an iMac is a more realistic idea but that has been done for years so I don't think anyone has a clue as to what the next Apple TV would be or how it would function. But I would bet anything that Siri has nothing to do with it.

bcrguy · Dec 9, 2011

i wish they would port SIRI to the mac as well that would be sweet cuz the speech commands work ok but not great..

mknopp · Dec 9, 2011

LOL!

Is it any wonder that Apple is so dang secretive? Analyst have been saying for years (YEARS!!!) that Apple will be getting into the TV market soon. Nobody paid them much mind.

Then one little line from Steve Jobs and suddenly everybody and their brother not only seems to think that they have figured out exactly what Apple is going to do. "They are obviously going to use Siri."

But, and here is the great, or sad depending on your perspective, part, without anyone actually knowing for sure IF Apple is actually going to make a TV, people are already complaining about the interface and its implementation. And, this I really love, they are already calling Apple a copycat because others have already done an interface that NOBODY HAS SEEN.

My lord, I am going to laugh my ass off if Apple doesn't get into the TV market and it turns out Steve Jobs made this comment a few years ago and was talking about the AppleTV.

In short, I cannot believe that people are actually disparaging Apple for something that is nothing more, at this point, than some pundit's dream or idea. Here is an idea. Why not wait to see what Apple comes out with before blasting them, or hailing them as conquering heroes.

HobeSoundDarryl · Dec 9, 2011

AidenShaw said:
This would be easy if the TV has one mono speaker.

In a surround sound system, where the TV may not even be fed the audio it gets much harder - masking 8 channels of sound with various delays between the speakers is more complicated.

Again, the concept is not Siri listening to what is output from the speaker(s) but knowing in advance of exactly what is going to come out of each speaker. The noise canceling generics in NC headphones don't get confused if a plane has 1 engine making noise or 3 engines or 8 engines. It just generically detects noise and screens it out.

This television concept could be far superior to the generics because it doesn't have to listen & interpret what might be noise vs. what is probably not noise. Instead, it will know the exact audio to be filtered- the exact sounds to come out of 1 mono speaker, or up to a 7.1 surround setup.

You simulate the former at a party when there are 5 or 7 or 50 people in the room but you zoom your attention in on the 1 or 2 you want to listen to at that point. Your brain is doing the processing to filter out the many "speakers" around you so that you are mostly focused on hearing the 1 or 2 you want to hear. Even in a loud party does that work for you? Probably and well enough to hear what you need to hear.

Technology is going to do that much better than we can. It is going to know exactly what the other people at your "party" is going to be saying (it is going to know exactly what is going to come out of 1 or more speakers in your setup). It will then ignore those sounds listening for what is not in those sounds (such as you asking Siri to change the channel).

At the extreme, you could use a camcorder of yourself asking Siri to change the channel, mix those requests into various incarnations of audio streams for mono, stereo, 5.1, 7.1 and have it play through this hypothetical television while you are also telling Siri to do something different than what you've requested in the video you made. Now it's your voice coming from all over the place (a variety of speakers in a surround setup) vs. your live voice coming from the couch. What does Siri hear? Siri ignores the camcorder video audio because it knows that's the programming you're watching and only hears the live you on the couch (because that is the different audio not in the audio stream the TV is processing to flow to the speakers).

If you've ever seen a waveform of sound, what an embedded microphone would "hear" is going to be able to be represented by a waveform. It won't matter if the sound comes from 1 speaker or 7 as it would hear the merged sounds as a waveform. Since it's also inserted into the audio stream that is going to produce the waveform it can know what the waveform will be when it is pushed through 1 or 5 or 7 speakers. The expected waveform can be compared to the actual waveform it hears and canceled out. What's left is the sounds that are heard that are not in the expected waveform. Those sounds would come from the person/people watching the program.

But that's not how I think it will work. I still think the best implementation is to embed the microphone in a remote and tune it for proximity. Push as little as one button, speak into the microphone and that's the bulk of what Siri hears. How well would this work? Turn on a TV program in your surround setup now. Ask Siri a question on your iPhone 4S now. Did the surround sound audio cause Siri to choke?

ILikeTurtles · Dec 9, 2011

Comcast is trying to make a voice-recognition set top box with Motorola? They can't even get their current Motorola boxes to work right. LMFAO. Motorola makes the worst electronic devices EVER!

Good luck with that Comcast.

HobeSoundDarryl · Dec 9, 2011

mknopp said:
My lord, I am going to laugh my ass off if Apple doesn't get into the TV market and it turns out Steve Jobs made this comment a few years ago and was talking about the AppleTV.

I agree with you. Several times that quote been slung around I've asked the question about when it was made. Was it just before he died or just before they rolled out :apple:

TV2... or airplay... or further back? Maybe "I've cracked it" was his forecast about the :apple:

TV2 being a much better crack at the set top than version 1. Maybe it was airplay as a way to solve some of the content sourcing issues?

Every new thing is a "Eureka, I've cracked it" until the next big innovation comes along to replace it. As secretive as Apple/Jobs seems to have always been about discussing future products, it seems likely he wouldn't let out a hint about a new product line before rolling out that new product line.

Unless he was just playing that industry. If so, I wish that there was a long list (maybe a couple of chapters worth) of so many issues needing some rapid & big innovations followed by the teaser of "I've cracked it". World peace. Energy. Health care. Homelessness. Disease. Education. Hunger. Debt. Communication. Transportation. Pollution. Etc.- all with "I've cracked it" teasers to then spur on everyone who could do something toward all of those causes to try to beat Apple to THE innovation.

Maybe the real benefit in that teaser is that we will end up with better televisions in the next few years, even if Apple chooses to do nothing in that space?

subsonix · Dec 9, 2011

HobeSoundDarryl said:
This television concept could be far superior to the generics because it doesn't have to listen & interpret what might be noise vs. what is probably not noise.

I'm not so sure about that, in the headphone example the noise source is recorded where you ears are more or less.

No interpretation needs to take place in either case. It would still have to listen, but the waveform comming from the TV before it gets to the speaker would be the noise source in this case (wether this is done with DSP or with an analog signal), but the impact of aucustics will make this source different from what is captured by the mic and quite significantly so. Adding more speakers at unknown distances from the TV will make this less effective, no doubt. Sound travels ~0.3 meters per ms, so moving a speaker 0.3 meters would offset the waveform 44 samples (at 44.1hz sampling rate). If you have ever recorded in stereo with 2 mics it will be apparent that slight ms delays can have significant impact on phase.

HobeSoundDarryl · Dec 9, 2011

There is some merit to your point. And we could get really technical to start considering variances such as room echo too. But I'm not wanting to go there. Suffice (for me at least to say) that if it's too hard to filter well due to the uniqueness of the setup (speaker distance, room echo, et all), an initial setup could be used to send sounds to the various speakers and "learn" the unique audio characteristics of the room and then apply that to the "listening" algorithm to better screen out such nuances.

For the last couple of years (maybe a decade now?) receivers have been available that have a person put a microphone in the "sweet spot" (main seating position) and then they auto-adapt the surround field toward something (their algorithm considers) optimal. In this case, the initial setup could use some canned audio to "learn" about the room and speakers to enhance it's ability to filter the "noise" out. If it's ear is always in the same place (embedded in the TV), its always the same distance from the speakers, there's always the same acoustical variances in the room, and so on. Activate the initial setup, be very quiet, let it do it's listening to enhance the personalization of the room profile and I bet the filter would be good enough to work well.

But again, I offer this just as a "it could be done that way" point. I personally believe a Siri implementation is in a handheld remote with embedded microphone with Siri tuned to listen to a very local (nearby) source of commands (probably immediately following a button push). If so, then it's not a matter of making Siri hear well over distance and through ambient (or direct noise) but to make it work much like working with Siri now while someone's surround sound system is running. How well does Siri work now in surround sound situations? (rhetorical)

AidenShaw · Dec 9, 2011

HobeSoundDarryl said:
Again, the concept is not Siri listening to what is output from the speaker(s) but knowing in advance of exactly what is going to come out of each speaker.

...what, but not when. It's more complicated than you say - especially if the TV set does not get the sound signal at all. (Or if the TV set gets multiple signals...)

HobeSoundDarryl said:
The noise canceling generics in NC headphones don't get confused if a plane has 1 engine making noise or 3 engines or 8 engines. It just generically detects noise and screens it out.

Completely different situation - the microphone is by the ear, so all the circuitry has to do is produce the complementary waveform. The number, nature and placement of the noise producers is irrelevant.

HobeSoundDarryl · Dec 9, 2011

AidenShaw said:
...what, but not when. It's more complicated than you say - especially if the TV set does not get the sound signal at all.

I never said it's not a complicated bit of software. Of course it is complicated. And if the audio is not routed through the Apple technology- a television or set top box- obviously there is no way for it to compare exactly what is supposed to be filtered out vs. what is playing through the speakers (just as if we don't actually speak through the microphone in the iPhone, the iPhone incarnation of Siri can't work well). It would then have to entirely work by trying to hear all sounds (programming and viewers) and sort out Siri orders from all of the noise. I don't think the latter would work well at all.

For an implementation of a Siri microphone embedded in the TV, I would speculate it's probably going to be a requirement that the audio gets processed through the Apple circuitry (it can always pass through to a surround amplifier). If we imagine an :apple:

TV3 with Siri, this is exactly how it would work as the :apple:

TV3 would be the source of video and audio being pushed to an amp/receiver that then pushes it on to the speakers. In a television, it's probably HDMI video & audio through the TV, then out of the TV back to an amp/receiver to push the sound to the speakers. Either way though (IMO), the audio stream would probably have to flow through the Apple circuitry to process it for this purpose.

But- for about the 3rd or 4th time now, I personally think a Siri implementation is a microphone in a remote held near the mouth of the person issuing the command. That's much simpler to make work and solves related problems such as "Kevin's" call for "Sponge Bob" switching the channel while the big game receiver is running for a touchdown. The remote possessor remains the gatekeeper.

TV Industry Preparing for Voice Recognition Interfaces in 2012

macrumors 68040

macrumors 601

macrumors 6502a

macrumors regular

macrumors 68000

macrumors G5

macrumors 68040

macrumors 68000

macrumors G5

Suspended

macrumors G4

macrumors 68040

macrumors regular

macrumors 6502a

macrumors P6

macrumors newbie

macrumors regular

macrumors member

macrumors G5

macrumors 6502

macrumors G5

macrumors 68040

macrumors G5

macrumors P6

macrumors G5

Our Staff