Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
68,173
38,945



Back in July, Apple introduced the "Apple Machine Learning Journal," a blog detailing Apple's work on machine learning, AI, and other related topics. The blog is written entirely by Apple's engineers, and gives them a way to share their progress and interact with other researchers and engineers.

Apple today published three new articles to the Machine Learning Journal, covering topics that are based on papers Apple will share this week at Interspeech 2017 in Stockholm, Sweden.

apple-machine-learning-journal-800x197.jpg

The first article may be the most interesting to casual readers, as it explores the deep learning technology behind the Siri voice improvements introduced in iOS 11. The other two articles cover the technology behind the way dates, times, and other numbers are displayed, and the work that goes into introducing Siri in additional languages.

Links to all three articles are below:

[*]Deep Learning for Siri's Voice: On-device Deep Mixture Density Networks for Hybrid Unit Selection Synthesis
[*]Inverse Text Normalization as a Labeling Problem
[*]Improving Neural Network Acoustic Models by Cross-bandwidth and Cross-lingual Initialization

Apple is notoriously secret and has kept its work under wraps for many years, but over the course of the last few months, the company has been open to sharing some of its machine learning advancements. The blog, along with research papers, allows Apple engineers to participate in the wider AI community and may help the company retain employees who do not want to keep their progress a secret.

Article Link: Apple Updates Machine Learning Journal With Three Articles on Siri Technology
 
I'd like them to publish an article titled: Making Siri work on-device and off-line for the multitude of tasks that utilize on-device data only and there's no reason to call the mothership every time someone wants to set a timer or play a song that's already on their device.

Too long of a title?
 
I'd like them to publish an article titled: Making Siri work on-device and off-line for the multitude of tasks that utilize on-device data only and there's no reason to call the mothership every time someone wants to set a timer or play a song that's already on their device.

Too long of a title?

Switch to Samsung!

Bixby works offline! /s

You can publish that on your S-Journal of S-Cience with your S-Pen!
 
I'd like them to publish an article titled: Making Siri work on-device and off-line for the multitude of tasks that utilize on-device data only and there's no reason to call the mothership every time someone wants to set a timer or play a song that's already on their device.

Too long of a title?

Yep, the title should be, "Make Siri work in the real world." Remember those that can, do, those that can't, write articles.
 
I'd like them to publish an article titled: Making Siri work on-device and off-line for the multitude of tasks that utilize on-device data only and there's no reason to call the mothership every time someone wants to set a timer or play a song that's already on their device.

Too long of a title?

The issue is that end user devices are currently underpowered for these kinds of tasks.

Firstly, you have to work out what the person has said. You have many different accents to take into account, as well as a huge number of local dialects. These both affect the way that words are said and the way in which sentences flow.

Once you know what the person has said, you must then match it to an intent. Again, there are countless ways that a person might say something. And you can't just assume that the person speaking is a native speaker of the language, so they will say things in 'weird' ways.

Assuming you have a 'neutral' accent with completely accurate grammar, and you know the exact phrase that will activate a specific function, then it's feasible to carry out the activity entirely on the device. Right now though, that functionality is limited to 'Hey Siri', with all of the complex stuff offloaded to much more powerful servers.
 
The issue is that end user devices are currently underpowered for these kinds of tasks.

Firstly, you have to work out what the person has said. You have many different accents to take into account, as well as a huge number of local dialects. These both affect the way that words are said and the way in which sentences flow.

Once you know what the person has said, you must then match it to an intent. Again, there are countless ways that a person might say something. And you can't just assume that the person speaking is a native speaker of the language, so they will say things in 'weird' ways.

Assuming you have a 'neutral' accent with completely accurate grammar, and you know the exact phrase that will activate a specific function, then it's feasible to carry out the activity entirely on the device. Right now though, that functionality is limited to 'Hey Siri', with all of the complex stuff offloaded to much more powerful servers.

This is why there needs to be a dedicated Siri chip in each device. A graphics grade chip with processing power and schemes designed specifically for speech recognition and intent processing with dedicated memory for building a personalized database for its user. With Apple’s clear focus on machine learning, I would not be surprised if this is exactly what’ll come into the next generations of iPhones.

Siri on a brand new iPhone could start off slower and more careful, making fewer assumptions and asking more follow up questions to make sure it understood what you said and what you asked for, growing more confident as it gets to know you.

It would still process complex tasks in Apple’s servers but would cache very common requests specific to its user. Things like music playback, weather reports, open apps, calendar management, common contact info, etc.

AirPods are currently too slow because there are too many handoffs happening and they don’t work at all offline for such basic tasks as increasing the volume. That’s just unacceptable. A dedicated Siri chip would fix that.

There really is no excuse for not having offline voice recognition. It already existed way back in the pre-Siri era iPhones as an accessibility feature. Siri just got much smarter voice abilities. The challenge is delegation of lower level tasks to a Siri chip and higher ones to the Siri server in the cloud.
 
This is why there needs to be a dedicated Siri chip in each device. A graphics grade chip with processing power and schemes designed specifically for speech recognition and intent processing with dedicated memory for building a personalized database for its user. With Apple’s clear focus on machine learning, I would not be surprised if this is exactly what’ll come into the next generations of iPhones.

Siri on a brand new iPhone could start off slower and more careful, making fewer assumptions and asking more follow up questions to make sure it understood what you said and what you asked for, growing more confident as it gets to know you.

It would still process complex tasks in Apple’s servers but would cache very common requests specific to its user. Things like music playback, weather reports, open apps, calendar management, common contact info, etc.

AirPods are currently too slow because there are too many handoffs happening and they don’t work at all offline for such basic tasks as increasing the volume. That’s just unacceptable. A dedicated Siri chip would fix that.

There really is no excuse for not having offline voice recognition. It already existed way back in the pre-Siri era iPhones as an accessibility feature. Siri just got much smarter voice abilities. The challenge is delegation of lower level tasks to a Siri chip and higher ones to the Siri server in the cloud.

It's only a matter of time, I'm sure. I don't know how much weight those rumors of the "Neural Engine" hold, but it definitely sounds Apple-y enough. A dedicated Machine Learning chip would be ideal for privacy, would allow more of this stuff to be processed locally, and if anyone's gonna be building a custom chip to put in their phones, Apple seems like a pretty good candidate.

I wouldn't count on AirPods getting dedicated Siri functionality anytime soon, the batteries on those things are tiny. Much like how phones got the "Hey Siri" detection, I could see AirPods eventually getting that just to trigger Siri on the connected device, but I'm pretty sure the phone will remain the "hub" for a while just given the tech they can cram into it and the battery size.

Offline voice recognition existed pre-Siri, sure, but let's not forget how limited it was. It definitely should exist in some form at least for basic tasks by now, but I would imagine the two technologies work very differently. At the very least it would be nice if when Siri is unavailable it would fall back to the old way, it's still buried in settings somewhere I believe.

My guess is if we don't see a "Neural Engine" soon, the issue is likely that these new machine learning chips are just too power hungry for mobile. If anyone can pull it off though, I'd imagine it would be Apple with their wave of custom chips through the years. Here's hoping we see an "N1" co-processor in the next iPhone. Even if it's limited, it's a great start.
 
The issue is that end user devices are currently underpowered for these kinds of tasks.

Firstly, you have to work out what the person has said. You have many different accents to take into account, as well as a huge number of local dialects. These both affect the way that words are said and the way in which sentences flow.

Once you know what the person has said, you must then match it to an intent. Again, there are countless ways that a person might say something. And you can't just assume that the person speaking is a native speaker of the language, so they will say things in 'weird' ways.

Assuming you have a 'neutral' accent with completely accurate grammar, and you know the exact phrase that will activate a specific function, then it's feasible to carry out the activity entirely on the device. Right now though, that functionality is limited to 'Hey Siri', with all of the complex stuff offloaded to much more powerful servers.
This is a problem, yes, for Siri, but accurate, secure, facial recognition in all light conditions, all faces, perhaps even linked to Siri, is a greater challenge. 3D speaking profile pics and emoji are coming. Siri is a step toward this. That facial recognition would cut a lot out of the tasks Siri would need to identify the user on its own, and Siri would do the same for Facial recognition - enhancing both capabilities and that synergy drastically enhancing efficiencies in processing.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.