Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
67,484
37,750


With its uncompromising focus on user privacy, Apple has faced challenges collecting enough data to train the large language models that power Apple Intelligence features and that will ultimately improve Siri.

apple-intelligence-black.jpeg

To improve Apple Intelligence, Apple has to come up with privacy preserving options for AI training, and some of the methods the company is using have been outlined in a new Machine Learning Research blog post.

Basically, Apple needs user data to improve summarization, writing tools, and other Apple Intelligence features, but it doesn't want to collect data from individual users. So instead, Apple has worked out a way to understand usage trends using differential privacy and data that's not linked to any one person. Apple is creating synthetic data that is representative of aggregate trends in real user data, and it is using on-device detection to make comparisons, providing the company with insight without the need to access sensitive information.

It works like this: Apple generates multiple synthetic emails on topics that are common in user emails, such as an invitation to play a game of tennis at 3:00 p.m. Apple then creates an "embedding" from that email with specific language, topic, and length info. Apple might create several embeddings with varying email length and information.

Those embeddings are sent to a small number of iPhone users who have Device Analytics turned on, and the iPhones that receive the embeddings select a sample of actual user emails and compute embeddings for those actual emails. The synthetic embeddings that Apple created are compared to the embedding for the real email, and the user's iPhone decides which of the synthetic embeddings is closest to the actual sample.

Apple then uses differential privacy to determine which of the synthetic embeddings are most commonly selected across all devices, so it knows how emails are most commonly worded without ever seeing user emails and without knowing which specific devices selected which embeddings as the most similar.

Apple says that the most frequently selected synthetic embeddings it collects can be used to generate training or testing data, or can be used as examples for further data refinement. The process provides Apple with a way to improve the topics and language of synthetic emails, which in turn trains models to create better text outputs for email summaries and other features, all without violating user privacy.

Apple does something similar for Genmoji, using differential privacy to identify popular prompts and prompt patterns that can be used to improve the image generation feature. Apple uses a technique to ensure that it only receives Genmoji prompts that have been used by hundreds of people, and nothing specific or unique that could identify an individual person.

Apple can't see Genmoji associated with a personal device, and all signals that are relayed are anonymized and include random noise to hide user identity. Apple also doesn't link any data with an IP address or ID that could be associated with an Apple Account.

With both of these methods, only users that have opted-in to send Device Analytics to Apple participate in the testing, so if you don't want to have your data used in this way, you can turn that option off.

Apple plans to expand its use of differential privacy techniques for improving Image Playground, Memories Creation, Writing Tools, and Visual Intelligence in iOS 18.5, iPadOS 18.5, and macOS Sequoia 15.5.

Article Link: Here's How Apple is Working to Improve Apple Intelligence
 
But the fact is I don’t WANT summarization or writing tools. As long as that and other generative AI stuff is Apple’s focus I will leave it off. How about a spelling checker that wasn’t an idiot? That I might go for. But the rest of what they have delivered, and advertised but not delivered, is not anything I want anywhere near my systems.
 
I've said it before elsewhere, but we may have already reached the peak of what LLMs can do, in terms of quality. They'll keep getting faster, of course, but once we've fed it all the actual human-generated content in existence, it'll mostly have other LLM-generated content to keep training on, in greater proportion as time goes on. Making worse garbage faster is not what I'd call progress. I'll keep practicing the craft of writing, thanks. Other enhancements to Siri are still welcome, like an awareness of what I'm currently looking at on the screen of my devices, or increased capabilities to equal what I can do myself with a tap or a click.
 
To be clear I fully support the efforts to protect user privacy.

But having said that, I am not convinced Apple can catch-up with the solution described in the article. The whole process seems pretty convoluted, which will probably make things slower and more prone to errors than competitor who aren’t as concerned with the privacy of their users and are just using actual data rather than synthetic data.
 
Last edited:
I've said it before elsewhere, but we may have already reached the peak of what LLMs can do, in terms of quality. They'll keep getting faster, of course, but once we've fed it all the actual human-generated content in existence, it'll mostly have other LLM-generated content to keep training on, in greater proportion as time goes on. Making worse garbage faster is not what I'd call progress. I'll keep practicing the craft of writing, thanks. Other enhancements to Siri are still welcome, like an awareness of what I'm currently looking at on the screen of my devices, or increased capabilities to equal what I can do myself with a tap or a click.
Maybe until the next advancement. ChatGPT didn’t even exist until 3 years ago. Transformer models it’s based on … the research paper is only 7 years old. This stuff is all quite new.
 
  • Love
Reactions: amartinez1660
I appreciate Apple’s stance on privacy and the lengths they go to (such as this example) to adhere to it while trying to deliver as good as possible products, even if it ties one hand behind their back compared to their competition (of course whether having one hand tied behind their back is the only reason they’re behind their competition, who knows). I would much rather have a very basic assistant that doesn’t require sacrificing my privacy than an advanced one that requires organizations knowing a lot of personal things about me. Not everyone wants that trade off though so it’s good there are options.
 
But the fact is I don’t WANT summarization or writing tools. As long as that and other generative AI stuff is Apple’s focus I will leave it off. How about a spelling checker that wasn’t an idiot? That I might go for. But the rest of what they have delivered, and advertised but not delivered, is not anything I want anywhere near my systems.

They should totally allow you to turn Apple Intelligence off.
 
So my emails can sound perfectly average?! This has zero interest for me -- I prefer typing myself with the hope of injecting some of my own personality. Crowd sourcing a generic response is a terrible idea.
This article was focused on how they're going to improve summaries you see when looking at lists of emails (which I find quite helpful even in their current state). Nothing about generating a generic voice for authoring emails.
 
So - forgive my ignorance - are we basically saying that Apple’s privacy policy will mean it will always be more complex to deliver workable AI and therefore always lag behind?

I don’t envy them that choice.
 
I understand that Apple flubbed the AI launch and that needs to get fixed but will the differing AI implementations actually influence a purchase.
 
For summarization or something else innocuous like that, I would guess knowing exactly what is in the dataset with "synthetic data" instead of the "everything including the kitchen sink" scorched earth approach of internet scraping that includes Reddit and who knows what else as data sources would eliminate the "eat rocks" type responses. It wouldn't be "as smart," but at least they'll have a semblance of knowing what type of responses a consumer may likely have. I dunno.
 
But the fact is I don’t WANT summarization or writing tools. As long as that and other generative AI stuff is Apple’s focus I will leave it off. How about a spelling checker that wasn’t an idiot? That I might go for. But the rest of what they have delivered, and advertised but not delivered, is not anything I want anywhere near my systems.
it's crazy how HORRIBLE the spellcheck and keyboard is on Ios. i've transitionned to iphone again 6 months ago , and i make just a lot of mistakes. it's exhausting. i can't use my iphone properly for long typing sesssions and have to just go on mac
 
I want to see this list of synthetic phrases. I would love to know what Apple guesses are the most common things people say on their platforms.
 
So my emails can sound perfectly average?! This has zero interest for me -- I prefer typing myself with the hope of injecting some of my own personality. Crowd sourcing a generic response is a terrible idea.
EXACTLY. I am a writer with my own voice and style. I am absolutely not going to use a tool that makes me sound like every other person using AI to string words together.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.