Apple Touts 'Differential Privacy' Data Gathering Technique in iOS 10

MacRumors

macrumors bot
Original poster
Apr 12, 2001
47,091
9,075



With the announcement of iOS 10 at WWDC on Monday, Apple mentioned its adoption of "Differential Privacy" - a mathematical technique that allows the company to collect user information that helps it enhance its apps and services while keeping the data of individual users private.


During the company's keynote address, Senior VP of software engineering Craig Federighi - a vocal advocate of personal privacy - summarized the concept in the following way:
We believe you should have great features and great privacy. Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable...crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.
Wired has now published an article on the subject that lays out in clearer detail some of the practical implications and potential pitfalls of Apple's latest statistical data gathering technique.
Differential privacy, translated from Apple-speak, is the statistical science of trying to learn as much as possible about a group while learning as little as possible about any individual in it. With differential privacy, Apple can collect and store its users' data in a format that lets it glean useful notions about what people do, say, like and want. But it can't extract anything about a single, specific one of those people that might represent a privacy violation. And neither, in theory, could hackers or intelligence agencies.
Wired notes that the technique claims to have a mathematically "provable guarantee" that its generated data sets are impervious to outside attempts to de-anonymize the information. It does however caution that such complicated techniques rely on the rigor of their implementation to retain any guarantee of privacy during transmission.

You can read the full article on the subject of differential privacy here.

Note: Due to the political nature of the discussion regarding this topic, the discussion thread is located in our Politics, Religion, Social Issues forum. All forum members and site visitors are welcome to read and follow the thread, but posting is limited to forum members with at least 100 posts.

Article Link: Apple Touts 'Differential Privacy' Data Gathering Technique in iOS 10
 
  • Like
Reactions: decafjava

SSD-GUY

macrumors 65816
Sep 20, 2012
1,043
1,806
Interstellar
Never thought I'd say this, but they've finally made all my years of learning stats for my Econ degree sound interesting!

Quite intrigued to see how this actually works out. My guess is that that they take this individual level data but perhaps apply it on a macro scale? But I can't see it being completely unbreakbale.
 
  • Like
Reactions: jnpy!$4g3cwk

nt5672

macrumors 68000
Jun 30, 2007
1,952
4,186
Good that they are taking privacy seriously . Looking forward to some experts review of this approach
Me too, but what I have heard is that as long as you are doing the same as everyone else your privacy is protected, but if you stand out in anyway then you can be identified.
 
  • Like
Reactions: VulchR

CFreymarc

Suspended
Sep 4, 2009
3,969
1,149
Me too, but what I have heard is that as long as you are doing the same as everyone else your privacy is protected, but if you stand out in anyway then you can be identified.
While I don't have the exact technique they are using, it is common to use a "double blind" addressing technique keep anonymity making it impossible to trace back to ID someone. There are descriptions of this technique a search away.
 
  • Like
Reactions: knemonic and MH01

nt5672

macrumors 68000
Jun 30, 2007
1,952
4,186
That's very good and all, but this is MacRumors (macRumors ;-)), I'm sure we can find a negative way to spin this.
Nothing ever progress if all you have are positive comments. You ever heard the expression, "Tell me what I need to hear, not what I want to hear"? The question is, do the negative comments have merit, if so, and most do, then someone at Apple should be listening. We cannot count on the Media, because they need access to Apple, to say what everyone is thinking.
 

omgitscro

macrumors 6502a
Jul 12, 2008
570
88
While I don't have the exact technique they are using, it is common to use a "double blind" addressing technique keep anonymity making it impossible to trace back to ID someone. There are descriptions of this technique a search away.
Background: my PhD advisor is a main contributor to the differential privacy literature, and my department overall has a few professors working on differential privacy. Although my own research doesn't deal with differential privacy, some of my past work has been in statistical privacy.

Response to quoted text: while Apple is, without a doubt, anonymizing all identifiers in the data (i.e. your name, address, and other contact info is 100% certain to have been stripped), this does not describe what differential privacy does (rather, anonymizing data is a prerequisite for all practical data privacy methodology). Differential privacy provides a probabilistic guarantee on the data-masking algorithm that, in layman's terms, if you have two datasets that differ only for one user, the output of the algorithm on both datasets are indistinguishable in some precise sense. There are various ways to construct this algorithm so that is differentially private.

The take-away is (and I'm addressing the other commenter): no, even if you are absolutely unique in the dataset, differential privacy guarantees you will be entirely indistinguishable. In their words, it is a guarantee that any attacker will never be able to verify or determine the true value for any entry in the protected data (e.g. the value of any variable for any particular individual).

Many argue that this concept, although it is an interesting mathematical tool, is too strong for use in practice, in that it cannot be practically implemented in any real-world scenario without removing all useful signal in the data. I can't name any companies or even government agencies that have any claims that their data are algorithmically protected with differentially private guarantees. What Apple has done here is truly revolutionary and I sincerely doubt any of its competitors are close to being able to do what they're doing today. Maybe in a decade or two?
[doublepost=1465909213][/doublepost]
Never thought I'd say this, but they've finally made all my years of learning stats for my Econ degree sound interesting!

Quite intrigued to see how this actually works out. My guess is that that they take this individual level data but perhaps apply it on a macro scale? But I can't see it being completely unbreakbale.
See my other reply for a more detailed response. In particular, differential privacy is a guarantee that no matter how any attacker aggregates the data, there is no way to pick out individual values for any of the variables collected, for any user.
 

H2SO4

macrumors 601
Nov 4, 2008
4,420
4,112
Nothing ever progress if all you have are positive comments. You ever heard the expression, "Tell me what I need to hear, not what I want to hear"? The question is, do the negative comments have merit, if so, and most do, then someone at Apple should be listening. We cannot count on the Media, because they need access to Apple, to say what everyone is thinking.
This. Also sooner or later Apple will start looking more closely at people to see what each demographic does and wants. The line they are on now will either move or become blurred.
In order to know what your customers want you have to know them. Period. Or you have to ask someone that does, (which means setting up a phantom corporation to do the dirty work or buying the info from someone that has done it already).
In order to know what makes people spend money you have to study them. Period.
Also, and this is a simplistic example. You have the choice not to collect data about someones age in the first place surely, you don’t have to collect it and then find a way to obfuscate it surely?
 
  • Like
Reactions: nt5672

2010mini

macrumors 601
Jun 19, 2013
4,178
3,899
Background: my PhD advisor is a main contributor to the differential privacy literature, and my department overall has a few professors working on differential privacy. Although my own research doesn't deal with differential privacy, some of my past work has been in statistical privacy.

Response to quoted text: while Apple is, without a doubt, anonymizing all identifiers in the data (i.e. your name, address, and other contact info is 100% certain to have been stripped), this does not describe what differential privacy does (rather, anonymizing data is a prerequisite for all practical data privacy methodology). Differential privacy provides a probabilistic guarantee on the data-masking algorithm that, in layman's terms, if you have two datasets that differ only for one user, the output of the algorithm on both datasets are indistinguishable in some precise sense. There are various ways to construct this algorithm so that is differentially private.

The take-away is (and I'm addressing the other commenter): no, even if you are absolutely unique in the dataset, differential privacy guarantees you will be entirely indistinguishable. In their words, it is a guarantee that any attacker will never be able to verify or determine the true value for any entry in the protected data (e.g. the value of any variable for any particular individual).

Many argue that this concept, although it is an interesting mathematical tool, is too strong for use in practice, in that it cannot be practically implemented in any real-world scenario without removing all useful signal in the data. I can't name any companies or even government agencies that have any claims that their data are algorithmically protected with differentially private guarantees. What Apple has done here is truly revolutionary and I sincerely doubt any of its competitors are close to being able to do what they're doing today. Maybe in a decade or two?
[doublepost=1465909213][/doublepost]

See my other reply for a more detailed response. In particular, differential privacy is a guarantee that no matter how any attacker aggregates the data, there is no way to pick out individual values for any of the variables collected, for any user.
So this is what Apple hard at work creating. I'm impressed.
 
  • Like
Reactions: knemonic

thermodynamic

Suspended
May 3, 2009
1,340
1,192
USA
Good that they are taking privacy seriously . Looking forward to some experts review of this approach
Only since the Great Lawsuit of 2010, or was it the other one from earlier?

Apple cares first and foremost to profit for Apple. All this privacy stuff is manure, until it is forced to do so. Otherwise there would have been no need for a lawsuit, surely?
 

MH01

Suspended
Feb 11, 2008
12,107
9,298
Only since the Great Lawsuit of 2010, or was it the other one from earlier?

Apple cares first and foremost to profit for Apple. All this privacy stuff is manure, until it is forced to do so. Otherwise there would have been no need for a lawsuit, surely?
Apple is using privacy as a selling point, I understand that, but if they make thier hardware secure/private , we are winners, they are doing it cause it = profit
 

Porco

macrumors 68040
Mar 28, 2005
3,064
5,782
I think this is a very positive thing. I think it's the very antithesis of the likes of Facebook and Google's approach to using user data (and increasingly Microsoft's too). So frankly, even if it doesn't work at all, it's better than the alternatives - at least they are trying!

I think even the most privacy-conscious users will concede there are useful types of data that can improve services and software when the developers can access such data. The compromise between that data and the users' privacy is unfortunately sometimes the 'collateral damage' in the process, so it's great if Apple are finding ways to have the best of both worlds. I would guess it's only really possible if your target is improving the product for the user, rather than identifying the user in order to sell ads to them, specifically, so it could become a real unique selling point for paid software development on iOS.

Still, they need to be careful and very sure that it works. There have been lots of instances in the past where claims of 'anonomysied data' have been proven to be trivially easy to de-anonomyse.
 
Last edited:

SSD-GUY

macrumors 65816
Sep 20, 2012
1,043
1,806
Interstellar
See my other reply for a more detailed response. In particular, differential privacy is a guarantee that no matter how any attacker aggregates the data, there is no way to pick out individual values for any of the variables collected, for any user.
But how will Apple transcend the individual level data into an aggregate form? Surely this transition point (from micro to macro data) indicates a weak point?
 

omgitscro

macrumors 6502a
Jul 12, 2008
570
88
But how will Apple transcend the individual level data into an aggregate form? Surely this transition point (from micro to macro data) indicates a weak point?
Read my reply to the second quoted comment, which I think answers your question.
[doublepost=1465918820][/doublepost]
So basically Apple is selling people's personal info.
No, they are not.
 

69Mustang

macrumors 604
Jan 7, 2014
7,135
13,208
In between a rock and a hard place
Still, they need to be careful and very sure that it works. There have been lots of instances in the past where claims of 'anonomysied data' have been proven to be trivially easy to de-anonomyse.
I am cautiously optimistic this will function as Apple hopes it does. If so it will allow Apple and the surrounding ecosystem to utilize and monetize the collected data while maintaining user privacy. Like you, I think they need to tread slowly and carefully. 9to5 has a more cautious take on the subject: http://9to5mac.com/2016/06/14/differential-privacy-security-questioned/
Relevant portion: "...Matthew Green, a Cryptography professor at Johns Hopkins University, was tweeting skeptically about it, describing the approach as untested."

Querying him, Green said that existing implementations of Differential Privacy had needed to compromise privacy to obtain accurate data.

The question is, what kind of data, and what kind of measurements are they applying it to, and what are they doing with it,” Green told Gizmodo. “It’s a really neat idea, but I’ve never really seen it deployed. It ends up being a tradeoff between accuracy of the data you are collecting and privacy.

“The accuracy goes down as the privacy goes up, and the tradeoffs I’ve seen have never been all that great,” Green continued. “[Again] I’ve never really heard of anyone deploying it in a real product before. So if Apple is doing this they’ve got a custom implementation, and they made all the decisions themselves."

It's a step. Hopefully a good one. This WWDC was weird. They concentrated on letting everyone know about a lot of fluff but the relevant changes were nearly unmentioned. I understand the gen pop digs emoji but this, delete stock apps, removing siri remote requirements from ATV gaming, and others will be far more important to Apple keeping it's success.
 

JohnnyGo

macrumors 6502a
Sep 9, 2009
673
382
So basically Apple is selling people's personal info.
No they're not!

Instead they seem to be able to use the COLLECTIVE data from all of their users to help/aid/assist each one individually. Potential examples:

- you search for a restaurant in a particular area or cuisine, you will receive the list ordered by how many more other users searched, drove to or marked said restaurants in the list

- you are typing the name of a recent newsworthy politician and the predictive keyboard in iOS will suggest said individual (trending)

That's what I understood is being done: AI powered by collective/anonymous data gathering.
 

JohnnyGo

macrumors 6502a
Sep 9, 2009
673
382
The article says Apple collects/aggregates the data employing 'Differential Privacy'. Nowhere did I see that Apple, beyond using this collected data for their own product or services improvement, is selling this info to third parties.

Please correct me if I'm wrong.
Correct. Using such data is definitely not selling the data.
 
  • Like
Reactions: satcomer

now i see it

macrumors 601
Jan 2, 2002
4,527
9,011
"Differential privacy" is just an obtuse and hopefully anonymous method of data mining. For Apple's benefit, not ours.

Why not call it for what it is? "Hopefully Anonymous User Unwanted Spying"
But I guess that wouldn't go over too well at the keynote.