Yikes. Reading the Technical Summary yields masterpieces of double-talk such as:
"Nearly identical" doesn't mean "identical". If an image that is only
nearly identical can generate the same number, then the number isn't "a unique number specific to that image". If you've encountered hashes as a way of verifying the authenticity of downloads or while reading about blockchain, that's not what is happening here. OK, they're talking about images that differ in size and quality so maybe you could call that "nearly identical" but that "nearly" makes a huge difference in the likelihood of a false match.
OK, so let's just trust that Apple have read about
Sally Clark and understand the difference between independent events (tossing a fair coin) and possibly correlated events (e.g. if one of your photos triggers a false match, how likely is it that there will be other "nearly identical" photos in your collection?) and haven't just multiplied the probability of a hash collision by the number of matches (... which would work but for that pesky "nearly").
Which is
not the same as "reviews each report to confirm that the match really has found CSAM and, if so, disables the account and sends a report ti NCMEC". If that's what they mean, why not say it clearly?
and
...so ignore the technicalities (which aren't technical enough to recreate and critique the process) and focus on how terms like "number unique to the image" or "identical" have gradually morphed via "nearly identical" and "perceptually and semantically similar" into "visually similar"... and that we're suddenly talking about analysing the
features of the image (which is precisely what some people here are saying
isn't happening "because hash").
Then we follow up with the
truly impressive and reassuring demonstration that a colour picture of a palm tree generates the same hash as exactly the same image converted to monochrome but a
completely different cityscape (with nary a palm tree in sight) generates a different hash. Wow. Anybody reading this
critically would be asking "what about a different picture containing palm trees, or maybe a similarly composed picture of a cypress tree? How about some examples of cropped/resized images which
couldn't be spotted by simply turning the image to B&W before hashing?" Maybe the system can cope with that - if so, why not
show it rather than a trivial Sesame Street "one of these three things is not the same" example?
I'm not questioning whether the technology makes a good effort at a very difficult task (matching images without being fooled by inconsequential changes) but the summary reeks of "positive spin" and avoiding the difficult questions: and for
any technology like this the #1 question has to be "what are the dangers of a false match" and is the risk justified by the rate of successful matches?
...and will people please,
please stop saying "it's not scanning your images, it's only checking hashes" - that's a distinction without a difference even before you replace "hashes" with Apple(R) NeuralHashes(TM).