A Look at Apple's Assistant Interface for the New iPhone

irDigital0l · Oct 1, 2011

I really want to use assistance for my iphone 4

APlotdevice · Oct 1, 2011

kdarling said:
I agree, at least partly.

For instance, when I heard that the 3G (128MB RAM) wasn't going to get the voice control of the 3GS (256MB RAM), my first thought was that they were reserving at least some of the extra memory for the voice control. That way, they wouldn't have to dynamically allocate the memory.

OTOH, my older WinMo phones with only 128MB RAM could recognize far more voice commands than iOS, so perhaps not that much memory was needed after all

(I wrote my first FFT spectrum analyzer for voice recognition in 6800 machine code around 1979 on a homebrew computer with about 4K RAM. So between that and the WinMo history, I'm not convinced that the above would be a great excuse, but it's a possible one depending on how they're using that memory.)

Regards.

By performance I mean speed and accuracy. I used to be be a WinMo user myself. In my experience the voice recognition wasn't all that great. You had to use very strict stings of words, and even then it got it wrong most of the time.

I don't have any experience with voice recognition on a 1979 computer, but considering how flakey it was on 90s computers I can only imagine it didn't work very well at all.

JulianL · Oct 2, 2011

kdarling said:
I agree, at least partly.

For instance, when I heard that the 3G (128MB RAM) wasn't going to get the voice control of the 3GS (256MB RAM), my first thought was that they were reserving at least some of the extra memory for the voice control. That way, they wouldn't have to dynamically allocate the memory.

OTOH, my older WinMo phones with only 128MB RAM could recognize far more voice commands than iOS, so perhaps not that much memory was needed after all

(I wrote my first FFT spectrum analyzer for voice recognition in 6800 machine code around 1979 on a homebrew computer with about 4K RAM. So between that and the WinMo history, I'm not convinced that the above would be a great excuse, but it's a possible one depending on how they're using that memory.)

Regards.

I'm more willing to accept the memory excuse. Speech recognition is now far more than simple spectrum analysis of discrete words and far more context is needed. With continuous speech even the word boundaries are often ambiguous and even when individual boundaries are clear then the initial signal analysis typically comes up with multiple possible candidates for each word which creates a search space that needs to be searched (and/or constrained as it is being built) using anything from simple stuff like word proximity through to syntactic and in some cases even semantic analysis. This can take huge amounts of memory for the most sophisticated systems. Also, as has already been mentioned by a few previous posters, there's speech recognition and then there's natural language understanding and the latter is a whole different ballgame. Even correctly resolving pronoun references in the assistant will be a complicated task.

It is my understanding that Apple's aspirations for this project are far beyond simple speech recognition and that they aspire to some level of basic natural language understanding. Presumeably the ultimate (and very possibly unattainable) objective should be to make the system good enough to be just like talking to one's real life executive assistant (and at at that point the awkwardness that some people here have been talking about in being seen to speak to one's phone would presumeably go away since people talk to their real life executive assistants on the phone all the time).

My concern (for Apple) is that this assistant has the potential to be another Newton handwritting recognition moment with the first release of the technology falling too far short of the mark as far as real word useability is concerned and putting users off which then stalls future development of the project. Did this technology need to stay in the lab for another 5 or 10 years before being released to the public? Until we see more than controlled demos and video mockups we can't know this and I'd be more than happy to be pleasantly surprised.

- Julian

TheAppleChamp · Oct 2, 2011

?

WILL THIS WORK ON IPAD 2? :apple:

----------

irDigital0l said:
I really want to use assistance for my iphone 4

WILL IT WORK ON IPAD2? :apple:

Winter Charm · Oct 2, 2011

QCassidy352 said:
Wirelessly posted (Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5)

Thing is, it would have to be really REALLY accurate. It only has to screw up a few words to be effectively useless. given that the current voice recognition is often waaay off, I'm pretty skeptical. But then again, I'd buy the new one just for the better processor/RAM/battery/camera, so this is just icing (if it works well).

Go download dragon dictation and download the siri app. Dragon dictation is fantastically accurate, I have used it a good bit - and Siri is quite amazing as well. Combining the two would give you a very powerful tool for voice commands/control.

KingCrimson · Oct 2, 2011

dinjin201 said:
Go download dragon dictation and download the siri app. Dragon dictation is fantastically accurate, I have used it a good bit - and Siri is quite amazing as well. Combining the two would give you a very powerful tool for voice commands/control.

As usual Apple doesn't really invent innovative technology but they find a way to integrate all these great technologies into one seamless package that "just works" for the customer.

kdarling · Oct 2, 2011

JulianL said:
I'm more willing to accept the memory excuse.

I am willing, depending on how it's done. For example, is it all decoded local to the device or by sending a recording to a server farm? Local could easily take a few hundred MB to store all the Markov statistical paths.

It is my understanding that Apple's aspirations for this project are far beyond simple speech recognition and that they aspire to some level of basic natural language understanding.

Not sure what you mean? AI?

Today's possible interactions are scripted forms with data that needs to be filled in before taking the next step. Filling in can be done by voice or keyboard.

Most likely there will be a set of common action scenarios chosen after keywords are recognized. E.g. ordering takeout, or getting a reservation for a dinner or show or a flight. With enough preloaded scenes, and possibly storing our food/seating preferences for later use, it'll look like AI magic.

Time to dig out the old Apple Knowledge Navigator video for those who've never seen it. Very much voice driven.

Young Spade · Oct 2, 2011

kdarling said:
I am willing, depending on how it's done. For example, is it all decoded local to the device or by sending a recording to a server farm? Local could easily take a few hundred MB to store all the Markov statistical paths.

Not sure what you mean? AI?

From what I've taken over the years, it is processor intensive, and less to do with actual memory. This is why Android phones send the data over the air and have it processed there.

If you have ever used Dragon Dicatation, you'll know that it uses a nice bit of CPU when you're heavily using it. And that's on a full blown computer.

I don't remember what event this was, but one of the guys from Google talked about it when he demonstrated it over the air.

firewood · Oct 2, 2011

Young Spade said:
If you have ever used Dragon Dicatation, you'll know that it uses a nice bit of CPU when you're heavily using it. And that's on a full blown computer.

Dragon Dictation started becoming popular and useful on older PCs with less CPU horsepower than the A5 will bring to the table. However the algorithms did eat a pile of memory and slowed the PCs responsiveness down to a crawl. More RAM and the dual-core A5 could help solve those problems.

Enough local performance could possibly improve on the added latency of a network round trip to a more powerful server farm in the cloud.

KingCrimson · Oct 2, 2011

firewood said:
Dragon Dictation started becoming popular and useful on older PCs with less CPU horsepower than the A5 will bring to the table. However the algorithms did eat a pile of memory and slowed the PCs responsiveness down to a crawl. More RAM and the dual-core A5 could help solve those problems.

Enough local performance could possibly improve on the added latency of a network round trip to a more powerful server farm in the cloud.

I think as long as you have an internet connection(3G or WiFi) the Assistant will delegate to the iCloud for raw processing of speech recognition. If you lose connectivity, then does it really matter?

Young Spade · Oct 2, 2011

firewood said:
Dragon Dictation started becoming popular and useful on older PCs with less CPU horsepower than the A5 will bring to the table. However the algorithms did eat a pile of memory and slowed the PCs responsiveness down to a crawl. More RAM and the dual-core A5 could help solve those problems.

Enough local performance could possibly improve on the added latency of a network round trip to a more powerful server farm in the cloud.

*shakes head* the process is still too much to implement on the actual phone itself. Which is my point. Which is why it isn't down to having "more memory", which is what was stated up above, or implied at least.

You could have just agreed with the point I was making instead of trying to pick it apart and accomplishing nothing.

JulianL · Oct 3, 2011

kdarling said:
I am willing, depending on how it's done. For example, is it all decoded local to the device or by sending a recording to a server farm? Local could easily take a few hundred MB to store all the Markov statistical paths.

...

Not sure what you mean? AI?

Today's possible interactions are scripted forms with data that needs to be filled in before taking the next step. Filling in can be done by voice or keyboard.

Most likely there will be a set of common action scenarios chosen after keywords are recognized. E.g. ordering takeout, or getting a reservation for a dinner or show or a flight. With enough preloaded scenes, and possibly storing our food/seating preferences for later use, it'll look like AI magic.

Time to dig out the old Apple Knowledge Navigator video for those who've never seen it. Very much voice driven.

Yes, I did mean AI although these terms (AI, Natural Language Understanding) are both pretty vague. The video on the old Siri home page (http://siri.com/about/) shows some very basic capabilities that I would say use some AI/NLU techniques, e.g. using knowledge of the recent conversation to influence search paramaters such as, when asked to search for a restaurant, assuming that it is for a meal immediately after the just-discussed movie. There is also some fairly robust parsing, e.g. "Take me drunk home" (or something like that) getting taxi options. OK, it's not the NLU that most researchers are still chasing, but's it's way beyond "What is the <X>" where <X> must be one of "time", "date", "day" or "temperature" type of constrained and rigid dialogs of older first-attempt voice control stuff.

Regarding memory useage, I tentatively withdraw my previous suspicions on that. The Siri video says explicitly that the voice input is uploaded to servers to be turned into text. There's no guarantee that Apple will keep it this way but given that the action resulting from the voice request will almost certainly involve internet searches then relying on a network connection doesn't seem to be an inappropriate constraint.

That Knowledge Navigator video is interesting. I hope that dream is still alive within Apple because the assistant in that imagined scenario is most definitely displaying very substantial AI/NLU capabilities. Sadly with Apple's secrecy we don't see what's going on. Microsoft and IBM tend to say quite a lot about their research projects (e.g. IBM's Watson). I assume that a company the size of Apple has a genuine blue-sky research department but maybe I'm wrong; I certainly never hear anything about it and have always assumed that that is down to Apple's culture of secrecy.

- Julian

----------

Young Spade said:
*shakes head* the process is still too much to implement on the actual phone itself. Which is my point. Which is why it isn't down to having "more memory", which is what was stated up above, or implied at least.

As I just posted in a reply to kdarling, the original promo video from siri here (http://siri.com/about/) says explicitly that Siri "sends <the> words up to be interpreted as text" which seems pretty clear that voice files are uploaded to back end servers for processing and I'd be very surprised if Apple hasn't kept it this way.

- Julian

Young Spade · Oct 3, 2011

JulianL said:
Yes, I did mean AI although these terms (AI, Natural Language Understanding) are both pretty vague. The video on the old Siri home page (http://siri.com/about/) shows some very basic capabilities that I would say use some AI/NLU techniques, e.g. using knowledge of the recent conversation to influence search paramaters such as, when asked to search for a restaurant, assuming that it is for a meal immediately after the just-discussed movie. There is also some fairly robust parsing, e.g. "Take me drunk home" (or something like that) getting taxi options. OK, it's not the NLU that most researchers are still chasing, but's it's way beyond "What is the <X>" where <X> must be one of "time", "date", "day" or "temperature" type of constrained and rigid dialogs of older first-attempt voice control stuff.

Regarding memory useage, I tentatively withdraw my previous suspicions on that. The Siri video says explicitly that the voice input is uploaded to servers to be turned into text. There's no guarantee that Apple will keep it this way but given that the action resulting from the voice request will almost certainly involve internet searches then relying on a network connection doesn't seem to be an inappropriate constraint.

That Knowledge Navigator video is interesting. I hope that dream is still alive within Apple because the assistant in that imagined scenario is most definitely displaying very substantial AI/NLU capabilities. Sadly with Apple's secrecy we don't see what's going on. Microsoft and IBM tend to say quite a lot about their research projects (e.g. IBM's Watson). I assume that a company the size of Apple has a genuine blue-sky research department but maybe I'm wrong; I certainly never hear anything about it and have always assumed that that is down to Apple's culture of secrecy.

- Julian

----------

As I just posted in a reply to kdarling, the original promo video from siri here (http://siri.com/about/) says explicitly that Siri "sends <the> words up to be interpreted as text" which seems pretty clear that voice files are uploaded to back end servers for processing and I'd be very surprised if Apple hasn't kept it this way.

- Julian

And as I posted in a reply to firewood, THAT IS WHAT I WAS SAYING.

*and again, *shakes head*

Jesus Christ.

Wicked1 · Oct 3, 2011

Great new feature, but will be funny is riding along watching a million or so people talking to themselves

Apple has done what Microsoft has talked about for years, and hopefully this works better.

shadowhawk2020 · Oct 3, 2011

Wicked1 said:
Great new feature, but will be funny is riding along watching a million or so people talking to themselves

Apple has done what Microsoft has talked about for years, and hopefully this works better.

Of course Android had it first

But Apple will probably make it better. Which will only give Google the gentle push it needs

LOVE competition.

voonyx · Oct 3, 2011

shadowhawk2020 said:
Of course Android had it first But Apple will probably make it better. Which will only give Google the gentle push it needs LOVE competition.

Yep android does it first, apple does it right, android will flood the market with phones that try to emulate (poorly) apples implementation, and several years later android apologists will say what apple did would have happened anyway. Never ending cycle.

shadowhawk2020 · Oct 3, 2011

voonyx said:
Yep android does it first, apple does it right, android will flood the market with phones that try to emulate (poorly) apples implementation, and several years later android apologists will say what apple did would have happened anyway. Never ending cycle.

Doing it right is subjective... I can do pretty much everything I want to do with android's voice recognition. It is amazingly accurate and very easy to use. But Apple will do it differently, and they will improve on what Android already does. Then Android will counter.

Android fans will feel it is better, Apple fans will disagree and theu will come to a forum like this to argue like children over what is a largely subjective feature.

Ahhh the joys of the internet.

chrmjenkins · Oct 3, 2011

kdarling said:
I am willing, depending on how it's done. For example, is it all decoded local to the device or by sending a recording to a server farm? Local could easily take a few hundred MB to store all the Markov statistical paths.

Will the idle usage memory utilization of the feature be rather small? In other words, for those who do not use it, will they pretty much get the benefit of 1GB of RAM over 512MB for multitasking compared to the iphone 4?

JulianL · Oct 3, 2011

Young Spade said:
And as I posted in a reply to firewood, THAT IS WHAT I WAS SAYING.

*and again, *shakes head*

Jesus Christ.

Blimey. All I was trying to do is support what you were saying and reinforce your point by supplying a link to a very explicit statement from the creator of the technology that shows that you are almost certainly correct. I guess I should learn not to help people unless I'm invited to do so. I didn't mean to annoy you, I was trying to help.

- Julian

macjock · Oct 3, 2011

robbieduncan said:
Have they solved the problem of feeling like a complete idiot when you use these sort of interfaces?

Give it time and it will be second nature. I remember not too many years ago people were self conscious about making a phone call on their mobile in public.

I kid you not kids

KingCrimson · Oct 3, 2011

For the $80 billion in R&D MSFT spent the last decade, they couldn't come up with something as good as this? I guess their PhDs are not very bright.

vitzr · Oct 3, 2011

A lot of blah, blah, blah over voice recognition. It doesn't take much to get the people in AppleLand all lathered up.

World Changing... must be quite boring in the walled garden if this is all it takes to excite the herd.

KingCrimson · Oct 3, 2011

vitzr said:
A lot of blah, blah, blah over voice recognition. It doesn't take much to get the people in AppleLand all lathered up.

World Changing... must be quite boring in the walled garden if this is all it takes to excite the herd.

You sound awfully jealous that your junky Android phone won't have a virtual AI.

voonyx · Oct 4, 2011

vitzr said:
A lot of blah, blah, blah over voice recognition. It doesn't take much to get the people in AppleLand all lathered up.

World Changing... must be quite boring in the walled garden if this is all it takes to excite the herd.

Yes in android land all it takes to get excited is news of a new iphone apparently

Vip · Oct 4, 2011

Beam me up Scotty

A Look at Apple's Assistant Interface for the New iPhone

Guest

macrumors 68040

macrumors 68000

macrumors newbie

macrumors 6502a

macrumors 65816

Contributor

macrumors 68020

macrumors G3

macrumors 65816

macrumors 68020

macrumors 68000

macrumors 68020

macrumors 68040

macrumors member

macrumors 6502a

macrumors member

macrumors 603

macrumors 68000

macrumors regular

macrumors 65816

macrumors 68030

macrumors 65816

macrumors 6502a

macrumors regular

Our Staff