Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I don't understand how it can be totally anonymous

Well, it can't. Given enough data, it is always possible to identify the person (there will always be references to places, activities and persons related to you). Also, using methods of forensic phonetics it is often possible to identify the speaker. The later could be circumvented by adjusting the signal (e.g. modifying the base frequency and the formant frequency of the voice) on the client so that the server does not know how the signal has been modified - such obfuscation is usually non-reversible. However, I have no idea whether it will affect the recognition performance (it might) and of course it does not solve the first problem.
 
I agree with those that are saying this isn't an issue, but

I don't understand how it can be totally anonymous

If I am a first time use of Siri

I ask it a question and the Siri back end assigns an anonymous id to my request
How does it know where to send this request back to?, it must know it came from my device

If it is truly anonymous and is there to help improve the Siri experience for me (better recognise my voice like mentioned earlier), what happens when I get a new device? has all this been wasted and I need to train it again to recognise me

You are confusing the runtime process vs the archival process. Of course Apple needs your device ID while the request is active. However this can be thrown out after the transaction is complete and only the anonymous token retained for the archive.
 
I agree with those that are saying this isn't an issue, but

I don't understand how it can be totally anonymous

If I am a first time use of Siri

I ask it a question and the Siri back end assigns an anonymous id to my request
How does it know where to send this request back to?, it must know it came from my device

If it is truly anonymous and is there to help improve the Siri experience for me (better recognise my voice like mentioned earlier), what happens when I get a new device? has all this been wasted and I need to train it again to recognise me
*Sigh*
It is likely a synchronous call which means that your phone is waiting for a response from the server. That is why you sometimes get a error back when Siri cannot reach the server.

This basically mean that the request and response is in one session where the response including the number comes back to your phone. Any future requests will send that number along.
 
WHO CARES? [...] Get over yourselves. This is 2013. Privacy is gone. Get used to it

If you don't care, that's fine. Some of us actually do, and we have the right to complain about it and fight against it. The right to privacy is a human right. We don't want Apple, the government, you, or anyone else to dismiss it or take it away.

Companies and the government are taking and storing people's voices without people's knowledge or (real) permission — I'm not talking about legalese buried somewhere in the TOS. As others have pointed out, voice prints can be linked to people. Whether they associate it to my AppleID or not, it is still my voice, and what I say is being stored and could be tracked back to me. Apple should have to have our opt-in consent before storing my voice or using it for other purposes.

I'm sure there are millions of people like you who are fine with Apple using their Siri submissions for Apple's profit, essentially using our voices and personal information so we can be their free beta testers and data source. But for the millions of people who don't know or who don't want their privacy invaded in this way, Apple should at least be honest and upfront about it so people can make an educated choice.
 
Well, it can't. Given enough data, it is always possible to identify the person (there will always be references to places, activities and persons related to you). Also, using methods of forensic phonetics it is often possible to identify the speaker. The later could be circumvented by adjusting the signal (e.g. modifying the base frequency and the formant frequency of the voice) on the client so that the server does not know how the signal has been modified - such obfuscation is usually non-reversible. However, I have no idea whether it will affect the recognition performance (it might) and of course it does not solve the first problem.

You are confusing the runtime process vs the archival process. Of course Apple needs your device ID while the request is active. However this can be thrown out after the transaction is complete and only the anonymous token retained for the archive.

Thanks guys
 
*Sigh*
It is likely a synchronous call which means that your phone is waiting for a response from the server. That is why you sometimes get a error back when Siri cannot reach the server.

This basically mean that the request and response is in one session where the response including the number comes back to your phone. Any future requests will send that number along.
To be fair, Siri probably does know who you are at run time. This would help for relationship knowledge and other personal data. It does beg the question of where "wife" is stored. Do the servers return "wife" and your phone cross references, or are Apples servers aware of the relationship? They could do it either way.
 
Wow, Apple fanboys will defend practically any action that Apple does, whether that action was unethical or not.
 
Wow, Apple fanboys will defend practically any action that Apple does, whether that action was unethical or not.

You realise that your post is without any content? What you say would be the definition of a "fanboy". However, has there been any poster here who is a fanboy? If so, give us names. Has Apple done anything unethical? If so, tell us what.
 
Wow, Apple fanboys will defend practically any action that Apple does, whether that action was unethical or not.

Your assuming that this is unethical. I don't find it so. Frankly Google analysis your information much more then Apple, and to the great benefit to the users. Google Now is an amazing product, but you toss privacy out the window. I understand that if I want these conveniences I need to allow my data to be parsed.
This discussion is like being mad at an online retailer wanting to know you address to ship your product. They need data to provide the service.
 
II ask it a question and the Siri back end assigns an anonymous id to my request
How does it know where to send this request back to?, it must know it came from my device

Your phone sends the voice clip, plus an ID so that Apple's server know where to send the reply. Without that ID, Siri couldn't work obviously. So Apple's server processes the voice clip, uses the ID to send a response back, and then they can and should throw the ID away.
 
Wow, Apple fanboys will defend practically any action that Apple does, whether that action was unethical or not.

I am really confused when I read stuff like this. Please tell us, what is unethical about this? I will ask it again: do you want speech recognition or not? If you want it, they must collect data to improve their algorithms. Its as simple as that. Maybe at some point someone (maybe even me :p ) will invent a way how to parse speech without using corpora - but right now, this is the state of the art. Objections such as yours are like complaining that clerks in the grocery store can see what you are buying (and btw., all that stuff is recorded as well on the security cameras + they save the purchase lists for market research).
 
My only question is why especially since its anonymized - what benefit is there to Apple?

Its a large databank of real word siri use they can test new algorithms or speech recognition patterns against.


back to the main topic, I don't see why it matter is Apple stores this data. My bank and credit card store my data too. The difference between Apple and many other companies is that they don't sell people's data to others.
 
From everything I have seen, the requests are synchronous instead of asynchronous meaning that the request and response happen in the same call. This means that Apple does not need to know where to send it back because the caller is still connected waiting for a response.

In this scenario, the first request would send a 0 or null/nil ID triggering the service to generate one. This new number would be used to store your first request and it would be sent back to your phone where it would be stored to be used for future requests.

As for Siri knowing who your "wife" is, that can be accomplished by Siri storing who your wife is on your phone or by you tagging your wife in your contact record for your wife manually. Once you do that, Siri will know who your wife is without having to store that information in their servers.

You can also tag your "work" in your address book for geo-fencing as long as it has a valid address.
 
2) It'd be against Google's business interests to allow anyone to get hold of said information.

Now that is a big misunderstanding. Yes, Google will not leave that information out of their hands. However, that doesn't help you. Google doesn't give your data to advertisers, they use your data on behalf of the advertisers to send the "right" advertisements to you. For payment, of course. The effect is the same, except Google makes sure they can keep making money off your data.
 
My only question is why especially since its anonymized - what benefit is there to Apple?

It could be used to improve the voice recognition system with more training data. Pick samples that the recognition engine is not 100% sure about have them "manually" recognized then use the new data to train the system to be more accurate.
 
Wow, Apple fanboys will defend practically any action that Apple does, whether that action was unethical or not.

Is there anything unethical in what Apple is doing with Siri?

----------

Now that is a big misunderstanding. Yes, Google will not leave that information out of their hands. However, that doesn't help you. Google doesn't give your data to advertisers, they use your data on behalf of the advertisers to send the "right" advertisements to you. For payment, of course. The effect is the same, except Google makes sure they can keep making money off your data.

No, there is no misunderstanding, the last thing Google wants is showing the data to others.
 
First of all, google wouldn't anonamize it. Secondly, they'd be selling that information to anyone willing to pay, for profit.

How's that tin foil hat fitting you? Assume nice and tight ;)
 
I have a side interest in linguistics. A massive corpus (collection of linguistic data) like this is absolutely vital for Siri to work properly. The bigger, the better &, excluding storage & technical requirements there isn't really any upper limit. Almost every linguistics research institute will have a collection of corpuses for different uses.

Apple needs a large corpus to track regional accents, local variations in words, local place names, how these same local words are spoken by non-locals (and how that differs depending on where they're from), shifts in most common questions over time etc etc.

Also large differences in typical questions asked depending on social status, age, gender, location etc. People living near beaches or fishing or sailing communities are likely to ask for tide tables, surf heights, wind direction etc, whereas city people won't, but might do when on / planning a on holiday. Siri needs to be able to understand a that with a high degree of probabilty and a huge corpus is necessary.

I would be very surprised if apple hadn't kept this data. I can only guess that maybe they delete two year old stuff because microphones were poorer 2 iPhone generations ago so it's not worth keeping the old recordings.



I
 
I get annoyed when privacy advocates demonize companies about storing this data, Apple compared to Google is a saint with privacy.

I feel like people also take for granted what storing oser data has done to improve our internet experience, data and analytic makes voice services much more improved to be able to understand dialect. Not to mention Google search result accuracy as well as "Did you mean?" for search misspellings.

If you value your privacy and have information to hide I suggest you stay away from services that need it to make them work better.
 
One thing is for sure, if Apple has any financial difficulties in the future, they will have some very specific personal user data and voice prints to sell.

Apple doesn't data mine this stuff for the sake of "storing."
It will eventually be sold.
 
One thing is for sure, if Apple has any financial difficulties in the future, they will have some very specific personal user data and voice prints to sell.

Apple doesn't data mine this stuff for the sake of "storing."
It will eventually be sold.

Wishful thinking?
 
or you would be in some serious trouble, right?

*tumbleweed*
Correct.
Hmmm... If you feel glad because of privacy concerns, I hope for your sake you don't have a gmail account! :)
Nope. I don't use Gmail. /goes and checks gmail lol
Nor do I.

But you'll notice all the posts minimalizing this because its Apple. If it was Google they'd be outraged.

Hypocrisy displayed by the faithful at its finest!

Yet Apple's free to turn on the mic or camera on my iDevice whenever they want. If anyone thinks otherwise they're only kidding themselves.

It's the risk we take by carrying what is essentially the best tracking device the Feds could ever ask for.

Paranoid? NOT at all.

Factual? ABSOLUTELY.

Welcome to technology circa 2013 :D
I think my paranoia just increased. lmbo

Yeah I know. Apple is totally interested in what you are doing. They have meetings about it.
That is why the stock dropped... they got all my queries lol.
 
One thing is for sure, if Apple has any financial difficulties in the future, they will have some very specific personal user data and voice prints to sell.

Apple doesn't data mine this stuff for the sake of "storing."
It will eventually be sold.
Apple stores this stuff to improve Siri's accuracy and functionality, not to sell it. Apple makes its money from hardware, software and media sales, not from advertising (excluding iAd) and data mining.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.