Apple Patent: Speech Recognition Improvements?

cryptochrome · Mar 2, 2003

I find this whole discussion rather odd.

The whole point of using two microphones is that it allows you to partially triangulate the location of sounds (it can tell you the position on and distance from the axis drawn between the two microphones - three would allow you to pinpoint the exact location of the sound). In doing so, it allows you to filter out everything else. This is nothing new.

But the original example of sound localization is already on either side of your head - your ears. It's part of what allows you to listen to something quiet even in a noisy room. And the reason why music is recorded in stereo.

Noise reduction is related but not exactly the same. The second microphone, in a sense, is the listener's ear. As long as the relative locations between the mike and ear stay the same a speaker can be used to cancel out outside sound.

The only thing semi-original in apple's patent seems to be that the system takes into account the changing location of the microphones due to moving axes of rotation (think the hinge on your powerbook or the neck of your iMac) and compensate for those changes. It does not seem to take into consideration the changing location of your mouth, which also matters.

Nothing is mentioned about tracking your voice, which would be a solution to that, or the exact nature of the software parts of the system. It does mention that the front end filters the signal a lot before matches it against an acoustic model database, while the back end actually does something with that result (presumably varying with the situation and supporting multiple back ends).

I'd like to add that the present USPTO appears to be staffed with idiots who grant patents for pretty much anything, without checking for prior art and without regard for sheer obviousness and current use. Furthermore they don't even know how to scan images properly, or have the brains to use a file format for patents online that can be easily printed (i.e. pdf).

edenwaith · Mar 2, 2003

Re: Reminds me of Scottie...

Originally posted by Sonofhaig
This post reminds me of the scene in "Star Trek, The Voyage Home". Scottie, talking to a 20th century computer, to use it.
Well, if anyone can bring us closer to that kind of technology........it's Apple.

I never got much into Star Trek, but that scene probably ranks as my favorite. "Hello, computer."
"Scottie, try using the keyboard."
"How quaint."

edenwaith · Mar 2, 2003

Rotating display?

Originally posted by arn
the microphones are attached to the display. They are talking about a display which could potentially rotate.

A display which can rotate? Now where have we seen one of those before?

As for two microphones, my guess it is to help 'find' the speaker, and concentrate on that noise source, and try and ignore other sounds. Similar to how our eyes work. With only one eye, we have difficulty in judging how far away something is, but when both eyes combine two images into a single image, that gives an interpretation to our brain the distance to an object. I feel a trigonometry lesson coming on...

But if this is an area which Apple is actively pursuing, I say this is cooler than any speed bump! When I got an iMac, I was anxious to get OS 8.6 so I could use the speech recognition. It's not perfect, and is somewhat limited in what it can do at times, but it is still worlds ahead of what I've seen any other OS do natively. If Windows has native speech recognition abilities, I haven't seen it yet, and I don't know of any large OEM who includes microphones with their computers. So, at least from what I've seen, Apple is still about the only OS player who has done at least something with speech recognition, but I believe there is still so much more which can be done with it. As mentioned, Apple hasn't done much with it for several years. My biggest gripe would probably be the speech recognition with the Chess program bite. I can hardly get anything to work for whatever reason. It understands 'castle king side', but it doesn't seem to understand when I try and move pieces. (pawn e2 to e4 OR e2 to e4)

scorpion · Mar 2, 2003

lots of apps

PDA would be a natural place.

Also, most email can be spoken but there aren't really standards. Usually when someone emails me a complicated question I call them because I'm too lazy to type. I would much rather "speak" the answer than type it.

But think about this: Keynote for presentations so I can "talk" you through the points and the slides automatically change. The iLife stuff would be great for slide shows or even for looking at scenes in a film (so we'd have to assign keywords to the chapters DVDs have now). iTunes - just assigning of stations, playlists, or artists.

I think the real big and cool stuff will be when we can collaborate on shared documents using only speech -- i.e. comments are audible and visual.

Am I making any sense? Long night last night.

Phil Of Mac · Mar 2, 2003

Originally posted by foniks2020
My thoughts exactly as soon as I read the rumor!!!!! This makes sooooo much sense to me. Who does want to write in all their e-mails? In fact who wants to TYPE them all in, especially on a cramped/carpal keyboard?

Uh, most people.

Keyboards are a fast and efficient way of inputting text. Assuming they could perfect voice recognition, how can you edit and punctuate with voice commands with nearly as much ease as you can type? I'm sure many people don't like to type, but I can't imagine why. I probably have the world's worst motor skills (not counting those with neurological disorders) but I can still type with ease. Especially on my Dvorak keyboard. I can at least type better than I can speak out loud. Certainly you can speak faster than you can type, but typing has many advantages to speech recognition. Anyone who thinks keyboards will fall into disuse is probably wrong.

edenwaith · Mar 2, 2003

Originally posted by confirmed

yeah, steve said he doesn't like PDAs. but his reason was because he thought entering text with a little stylus was a flawed method. i think i remember a quote where he asked who wants to write entire emails like that.

Wasn't that a comment about those new laptops with the screen which one can write on? (I'm blanking the name of them right now...tablets?). So I think Jobs was talking about those, not necessarily PDAs.

edenwaith · Mar 2, 2003

Originally posted by Phil Of Mac

Keyboards are a fast and efficient way of inputting text. Assuming they could perfect voice recognition, how can you edit and punctuate with voice commands with nearly as much ease as you can type? I'm sure many people don't like to type, but I can't imagine why.

I enjoy typing...probably because it came quite naturally to me, and I've become one of the fastest typists I know. Granted, I can talk faster than I can type, but it could be quite a pain to have to learn to correct the computer while speaking. What about when one says the word "there". Perhaps they meant "they're" or "their". That certainly makes it difficult for the computer to comprehend at times.

Plus comma who would want to have to keep saying the punctuation marks question mark

Teqanjel · Mar 2, 2003

The larger issue

"The problem with computers is that they do what you tell them to do, not what you want them to do."

This old quote (by whom, I don't know) demonstrates why the biggest problem with computers is the interface -- telling the computer what you want, the computer telling you what it wants. The main reason why the graphic user interface was such a breakthrough was because it took a giant step in the direction of allowing us to communicate with these machines more "naturally" -- pointing at something rather than describing what we want with unnatural, often arcane commands. The more we can interact with computers the ways we interact with other human beings, the more people will feel comfortable with them, the more people will use them...the more people will buy them.

Apple has always stressed this approach, beginning with their Human Interface Guidelines of years past. Handwriting recognition, speech recognition, perhaps eventually things like recognition of gestures, facial expressions, and other body language...the less we have to think about how we communicate with a computer, the more useful it will be to us.

But keep in mind that "how we communicate" depends on a number of different variables: our own personal preferences (talk, write, or type; listen or read), the context in which we need to communicate (quiet or noisy room; alone or with others around), and the nature of what we are communicating (a single direction; responding to e-mail; drafting a manuscript; brainstorming ideas). The idea is not to supplant, say, typing with speech recognition, but to augment it -- add it to the computer's repetoire. The more choices we have for interacting with these machines, the more readily we will be able to do so. And the more effective tools they will become.

Just a little philosophical musing on a Sunday afternoon...

eric_n_dfw · Mar 2, 2003

Originally posted by edenwaith
What about when one says the word "there". Perhaps they meant "they're" or "their". That certainly makes it difficult for the computer to comprehend at times.

Plus comma who would want to have to keep saying the punctuation marks question mark

I've used IBM ViaVoice on the PC a couple times, and it handles the there, they're, their issue very well. Natural language recognition software is made to understand the context of the word's it hears so that it can make just those decisions.

Your second point (of what I quoted above) is one of my pet peeves, I hate having to dictate punctuation and paragraph endings and what not.

Phil Of Mac · Mar 2, 2003

What if you actually want to input the word "comma"? What then?

MacQuest · Mar 2, 2003

Re: Rotating display?

Originally posted by edenwaith
A display which can rotate? Now where have we seen one of those before?

if you are referring to the failure that is the Tablet PC, then this is just another example of a good idea that is poorly implemented by the Windows community.

Apple's version of a tablet computer or whatever this ends up being, without question, will be a much more full featured as well as aesthetlically pleasing device. The same way that Apple pushed the envelope with the iPod by taking an existing, poorly implemented technology and making it a lot better.

edenwaith · Mar 2, 2003

Re: Re: Rotating display?

Originally posted by MacQuest
if you are referring to the failure that is the Tablet PC, then this is just another example of a good idea that is poorly implemented by the Windows community.

Actually, I was referring to the iMac, since the display can rotate on its 'neck'.

As for getting speech recognition software to understand the word "comma", perhaps someone just needs to spell it out: c-o-m-m-a.

foniks2020 · Mar 3, 2003

Originally posted by Phil Of Mac
Uh, most people.

Keyboards are a fast and efficient way of inputting text. Assuming they could perfect voice recognition, how can you edit and punctuate with voice commands with nearly as much ease as you can type? I'm sure many people don't like to type, but I can't imagine why. I probably have the world's worst motor skills (not counting those with neurological disorders) but I can still type with ease. Especially on my Dvorak keyboard. I can at least type better than I can speak out loud. Certainly you can speak faster than you can type, but typing has many advantages to speech recognition. Anyone who thinks keyboards will fall into disuse is probably wrong.

I never said you wouldn't use a keyboard. I implied that people don't like using a keyboard.

What I was referring to is that for most people speech IS a much faster way of expressing lots of content quickly. No matter how fast you can type people will always be faster talkers.

Now what I was thinking of is that you could use this to quickly compose your thoughts or take many quick notes over time or just spew out random crap AND record it digitally and quickly.

There is always a need to EDIT. Even the fastest typers (IMO especially the fastest) go back and do things like SPELL CHECK, grammer checks, and editing for clarity of thought. So you go back with a keyboard while you're re-reading and edit it. Actually, editing is what the keyboard is really really good for.

Anyways, there are plenty of people who find using speech recognition to be preferable to typing.

Finally, it's all about the PORTABLE factor.... speaking will always be the best 'hands-free' input method ;-p trump that one...

foniks2020 · Mar 3, 2003

Re: Re: Re: Rotating display?

Originally posted by edenwaith
Actually, I was referring to the iMac, since the display can rotate on its 'neck'.

Yes the iMac has a 'rotating' display but don't you people think that a hand held device is the ultimate 'rotating' display, seeing as how it has the rotation capabilities of your hand... ie: 360 degrees on any axis plus the rotation of your elbow and your shoulder (super-rotation)?

Originally posted by edenwaith

As for getting speech recognition software to understand the word "comma", perhaps someone just needs to spell it out: c-o-m-m-a.

Meta words are better than spelling it out. Say a meta word like 'punctuate' then 'comma'. The software takes 'punctuate' as a queue to listen for 'comma' 'period' 'question' or any of the other marks. If you say something else it assumes you want it to record the word 'punctuate' instead. This is used successfully in other applications like in using mouse gestures where a click-hold-gesture opportunity lasts for a pre-determined amount of time (5 secs is default) then becomes a normal click-hold.

foniks2020 · Mar 3, 2003

Re: The larger issue

Originally posted by Teqanjel
"The problem with computers is that they do what you tell them to do, not what you want them to do."

This old quote (by whom, I don't know) demonstrates why the biggest problem with computers is the interface -- telling the computer what you want, the computer telling you what it wants. The main reason why the graphic user interface was such a breakthrough was because it took a giant step in the direction of allowing us to communicate with these machines more "naturally" -- pointing at something rather than describing what we want with unnatural, often arcane commands. The more we can interact with computers the ways we interact with other human beings, the more people will feel comfortable with them, the more people will use them...the more people will buy them.

Apple has always stressed this approach, beginning with their Human Interface Guidelines of years past. Handwriting recognition, speech recognition, perhaps eventually things like recognition of gestures, facial expressions, and other body language...the less we have to think about how we communicate with a computer, the more useful it will be to us.

But keep in mind that "how we communicate" depends on a number of different variables: our own personal preferences (talk, write, or type; listen or read), the context in which we need to communicate (quiet or noisy room; alone or with others around), and the nature of what we are communicating (a single direction; responding to e-mail; drafting a manuscript; brainstorming ideas). The idea is not to supplant, say, typing with speech recognition, but to augment it -- add it to the computer's repetoire. The more choices we have for interacting with these machines, the more readily we will be able to do so. And the more effective tools they will become.

Just a little philosophical musing on a Sunday afternoon...

Thank you. Well said.

Dunepilot · Mar 3, 2003

Originally posted by dstorey
well that could be an option when it becomes fully intergrated with iLife As well as .mac access, a 1-click system will be incorperated that sends your movie/music to playboy enterprises and in three working days you get a proffesionaly produced top shelf dvd delivered to your door...or you could produce it in iMovie for free...just for the budding prawn enthusiasts. In all seriousness that is a disturbing suggestion... herbie hancock indeed..

you know what I mean - Herbie must have spawned a thousand imitators who are today gainfully employed in a certain pr0n 'industry'

iSegway · Mar 3, 2003

Ok, here is the problem... no one wants to write out all their emails by hand on a tablet pc, right?

Well, what if you used voice recognition and pen input together? Would both of these combined be better, or equivalent to typing?

Now keep in mind I have never used a voice recognition program so I don't know if they use a system like this or not.

So... here is what I envision..

Tablet pc using stylus for punctiation, either touch screen icons for comma, period, etc. etc.. Or you could manually write those puntuatationmarks in. This could also be used for odd(made-up) words or markings.

Now, VR is say 90% accurate, right? So how about if when you spoke each word, multiple words that were similar were displayed in a list below the words dictated? So you say "cat", the program types the word "chat" on the screen accidentally but lists several words that you could have meant underneath, in the order of probability. So, listed below Chat, is cat , flat, rat, spat, mat( these are just examples). So out of the list you would just touch the actual word with your stylus and it would be selected rather than the word accidentally dictated. This system still might be slow, I don't know.

Now, another idea I have that deals with background noise is a device that pilots use.... It is called a "throat mic". Pretty self explanatory right? It is a mike that is worn like a choker, you know those tight black band-like things that girls wear but with microphones built into it. This is used to get rid of the background noise of the aircraft motor. So why not use that for this application? I envision ultra sensitive mircophones built in these throat mics, so you could essentially whisper if you were around other people. This system might even be almost inaudible to someone around you at a coffee shop or at work or whatever. Now ideally this would be bluetooth enabled to get rid of those nasty cords. It could also have earphones coming off of the throat mic "choker" to your ears. I don't lnow if this could be made small enough to not look ridiculous while wearing it or not. Who knows, it might even beome fashionable!

A more unusual idea, and one that I am not aware of being tried, is to create an entire language that is ideally suited for voice recognition. Does anyone know if this has been toyed with? Would there be any benefits to trying something like this? Or are the problems more so with the speaker rather than the language itself? If a language could be created that made inaccuracies almost nonexistant, it could become an international language shared by all countries. I know... this idea is little far-fetched.

zon7 · Mar 3, 2003

Re: A few ideas about this

Originally posted by Sol
I do not understand why two microphones are necessary for speech recognition. The only theory I have about this is that one microphone would pick up the person speaking and the other would pick up the background noise.

It's very easy if you know about signal processing. There is an operation called correlation that return its higher value when the two signals are more similar to each other. As two micros will capture diferent noie, using the correlation and after that other easy operations, you'll be able to filter the noise received by the two micros and have a very acceptable voice signal.

PaisanoMan · Mar 3, 2003

Illustrations

I don't know if anyone's mentioned this yet, but these patents do come with illustrations.

Page six of the patent shows a very ... crude diagram of what appears to be Mac II (okay, so it's just a symbol for a computer), with a monitor that rotates about a specific axis [909]. The iMac's monitor rotates precisely about that axis (and others). Perhaps this is planned for use in a future iMac.

The rotating monitor is clearly labeled in the patent as a "monitor," so I don't really see any evidence for a tablet.

dstorey · Mar 3, 2003

Originally posted by edenwaith
I What about when one says the word "there". Perhaps they meant "they're" or "their". That certainly makes it difficult for the computer to comprehend at times.

Plus comma who would want to have to keep saying the punctuation marks question mark

Thats where Natural Language Processing comes in. One of the options I could have taken at uni. It's related to AI, and basically if the computer is given the rules of a language then it should be able to tell what word you mean by the word order. Like you don't say 'Give them back they're book' for example. I can't see punctuation being a problem either. That can be done automatically if the computer knows the rules of grammar, and by the iotation in your voice and your use of natural pauses as you speak. It would probably be better at grammar than a lot of people, including me, if it is done right.

Pedro Estarque · Mar 3, 2003

This should be interesting if it happens, and possible a serious treat to privacy. Offices would be a much noisier place, even for those used to windtunel macs.
Imagine you speaking out loud an email to your girl friend. Besides, my grandmother would really think I lost my mind after hearing me speak such intimately with my mac .

Awimoway · Mar 3, 2003

Some of the Sony PDAs have revolving screens:

http://www.sonystyle.com/is-bin/INT...hp&CatalogCategoryID=hO4KC0.N7AoAAADzP5IE_KQI

pgwalsh · Mar 3, 2003

Apple is using voice recognition technology in the mail apple for spam filtering. Well, that's what Phil Schiller said at MYSF. In terms of licensing it make sense if they use the technology to write emails or in some other way.

Snowy_River · Mar 3, 2003

Originally posted by edenwaith
What about when one says the word "there". Perhaps they meant "they're" or "their". That certainly makes it difficult for the computer to comprehend at times.

I know that there have been other replies to this already, but I just thought that I'd say that the "there"/"their"/"they're" issue is one of my biggest pet peeves when reading emails or posts to forums such as these. I think that for many, many people letting the computer decide which to use would actually be more likely to get it right.

As to the speed question, as an interesting side-light, it's interesting to note that the average spoken word is somewhere around 200 words per minute. Half a century ago (back when typing was a bit less editable than it is now, with all of our fancy word processors and such), a good secretary could type at around 100 to 120 words per minute. An excellent secretary could type more than 200 words per minute (i.e. faster than the average spoken word). So, typing fast and accurately is quite possible, though I've never known someone who could type that fast. (The fastest typist that I know is my sister, who can type around 100 to 120 wpm).

backdraft · Mar 3, 2003

Originally posted by awulf
Wouldn't a rotating Microphone be a bit weird and expensive?

Well the iMac has a built in microphone and the lcd monitor rotates along with the mic.

Apple Patent: Speech Recognition Improvements?

macrumors regular

macrumors 6502a

macrumors 6502a

macrumors newbie

macrumors 68020

macrumors 6502a

macrumors 6502a

macrumors newbie

macrumors 68000

macrumors 68020

macrumors 6502a

macrumors 6502a

macrumors regular

macrumors regular

macrumors regular

macrumors 6502a

macrumors member

macrumors newbie

macrumors member

macrumors 6502a

macrumors regular

macrumors 68000

macrumors 68000

macrumors 68030

macrumors 6502

Our Staff