I agree, at least partly.
For instance, when I heard that the 3G (128MB RAM) wasn't going to get the voice control of the 3GS (256MB RAM), my first thought was that they were reserving at least some of the extra memory for the voice control. That way, they wouldn't have to dynamically allocate the memory.
OTOH, my older WinMo phones with only 128MB RAM could recognize far more voice commands than iOS, so perhaps not that much memory was needed after all
(I wrote my first FFT spectrum analyzer for voice recognition in 6800 machine code around 1979 on a homebrew computer with about 4K RAM. So between that and the WinMo history, I'm not convinced that the above would be a great excuse, but it's a possible one depending on how they're using that memory.)
Regards.
I'm more willing to accept the memory excuse. Speech recognition is now far more than simple spectrum analysis of discrete words and far more context is needed. With continuous speech even the word boundaries are often ambiguous and even when individual boundaries are clear then the initial signal analysis typically comes up with multiple possible candidates for each word which creates a search space that needs to be searched (and/or constrained as it is being built) using anything from simple stuff like word proximity through to syntactic and in some cases even semantic analysis. This can take huge amounts of memory for the most sophisticated systems. Also, as has already been mentioned by a few previous posters, there's speech recognition and then there's natural language understanding and the latter is a whole different ballgame. Even correctly resolving pronoun references in the assistant will be a complicated task.I agree, at least partly.
For instance, when I heard that the 3G (128MB RAM) wasn't going to get the voice control of the 3GS (256MB RAM), my first thought was that they were reserving at least some of the extra memory for the voice control. That way, they wouldn't have to dynamically allocate the memory.
OTOH, my older WinMo phones with only 128MB RAM could recognize far more voice commands than iOS, so perhaps not that much memory was needed after all
(I wrote my first FFT spectrum analyzer for voice recognition in 6800 machine code around 1979 on a homebrew computer with about 4K RAM. So between that and the WinMo history, I'm not convinced that the above would be a great excuse, but it's a possible one depending on how they're using that memory.)
Regards.
I really want to use assistance for my iphone 4
![]()
Wirelessly posted (Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5)
Thing is, it would have to be really REALLY accurate. It only has to screw up a few words to be effectively useless. given that the current voice recognition is often waaay off, I'm pretty skeptical. But then again, I'd buy the new one just for the better processor/RAM/battery/camera, so this is just icing (if it works well).
Go download dragon dictation and download the siri app. Dragon dictation is fantastically accurate, I have used it a good bit - and Siri is quite amazing as well. Combining the two would give you a very powerful tool for voice commands/control.
I'm more willing to accept the memory excuse.
It is my understanding that Apple's aspirations for this project are far beyond simple speech recognition and that they aspire to some level of basic natural language understanding.
I am willing, depending on how it's done. For example, is it all decoded local to the device or by sending a recording to a server farm? Local could easily take a few hundred MB to store all the Markov statistical paths.
Not sure what you mean? AI?
If you have ever used Dragon Dicatation, you'll know that it uses a nice bit of CPU when you're heavily using it. And that's on a full blown computer.
Dragon Dictation started becoming popular and useful on older PCs with less CPU horsepower than the A5 will bring to the table. However the algorithms did eat a pile of memory and slowed the PCs responsiveness down to a crawl. More RAM and the dual-core A5 could help solve those problems.
Enough local performance could possibly improve on the added latency of a network round trip to a more powerful server farm in the cloud.
Dragon Dictation started becoming popular and useful on older PCs with less CPU horsepower than the A5 will bring to the table. However the algorithms did eat a pile of memory and slowed the PCs responsiveness down to a crawl. More RAM and the dual-core A5 could help solve those problems.
Enough local performance could possibly improve on the added latency of a network round trip to a more powerful server farm in the cloud.
I am willing, depending on how it's done. For example, is it all decoded local to the device or by sending a recording to a server farm? Local could easily take a few hundred MB to store all the Markov statistical paths.
...
Not sure what you mean? AI?
Today's possible interactions are scripted forms with data that needs to be filled in before taking the next step. Filling in can be done by voice or keyboard.
Most likely there will be a set of common action scenarios chosen after keywords are recognized. E.g. ordering takeout, or getting a reservation for a dinner or show or a flight. With enough preloaded scenes, and possibly storing our food/seating preferences for later use, it'll look like AI magic.
Time to dig out the old Apple Knowledge Navigator video for those who've never seen it. Very much voice driven.
*shakes head* the process is still too much to implement on the actual phone itself. Which is my point. Which is why it isn't down to having "more memory", which is what was stated up above, or implied at least.
Yes, I did mean AI although these terms (AI, Natural Language Understanding) are both pretty vague. The video on the old Siri home page (http://siri.com/about/) shows some very basic capabilities that I would say use some AI/NLU techniques, e.g. using knowledge of the recent conversation to influence search paramaters such as, when asked to search for a restaurant, assuming that it is for a meal immediately after the just-discussed movie. There is also some fairly robust parsing, e.g. "Take me drunk home" (or something like that) getting taxi options. OK, it's not the NLU that most researchers are still chasing, but's it's way beyond "What is the <X>" where <X> must be one of "time", "date", "day" or "temperature" type of constrained and rigid dialogs of older first-attempt voice control stuff.
Regarding memory useage, I tentatively withdraw my previous suspicions on that. The Siri video says explicitly that the voice input is uploaded to servers to be turned into text. There's no guarantee that Apple will keep it this way but given that the action resulting from the voice request will almost certainly involve internet searches then relying on a network connection doesn't seem to be an inappropriate constraint.
That Knowledge Navigator video is interesting. I hope that dream is still alive within Apple because the assistant in that imagined scenario is most definitely displaying very substantial AI/NLU capabilities. Sadly with Apple's secrecy we don't see what's going on. Microsoft and IBM tend to say quite a lot about their research projects (e.g. IBM's Watson). I assume that a company the size of Apple has a genuine blue-sky research department but maybe I'm wrong; I certainly never hear anything about it and have always assumed that that is down to Apple's culture of secrecy.
- Julian
----------
As I just posted in a reply to kdarling, the original promo video from siri here (http://siri.com/about/) says explicitly that Siri "sends <the> words up to be interpreted as text" which seems pretty clear that voice files are uploaded to back end servers for processing and I'd be very surprised if Apple hasn't kept it this way.
- Julian
Great new feature, but will be funny is riding along watching a million or so people talking to themselves
Apple has done what Microsoft has talked about for years, and hopefully this works better.
Of course Android had it firstBut Apple will probably make it better. Which will only give Google the gentle push it needs
LOVE competition.
Yep android does it first, apple does it right, android will flood the market with phones that try to emulate (poorly) apples implementation, and several years later android apologists will say what apple did would have happened anyway. Never ending cycle.
I am willing, depending on how it's done. For example, is it all decoded local to the device or by sending a recording to a server farm? Local could easily take a few hundred MB to store all the Markov statistical paths.
And as I posted in a reply to firewood, THAT IS WHAT I WAS SAYING.
*and again, *shakes head*
Jesus Christ.
Have they solved the problem of feeling like a complete idiot when you use these sort of interfaces?
A lot of blah, blah, blah over voice recognition. It doesn't take much to get the people in AppleLand all lathered up.
World Changing... must be quite boring in the walled garden if this is all it takes to excite the herd.![]()
A lot of blah, blah, blah over voice recognition. It doesn't take much to get the people in AppleLand all lathered up.
World Changing... must be quite boring in the walled garden if this is all it takes to excite the herd.![]()