Can he have evidence? He doesn't have the system to test it so he's drawing conclusions from the available data. You yourself asked me how you could possibly know how Apple arrived at the magic number 30. This guy, whether right or wrong, applied his expertise in the field to the info he can find on this tech. Doesn't seem like he can do better than that at the moment, and this is another hypocritical aspect of your post, dismissing the assumptions and conclusions of someone who should actually have at least some knowledge of this stuff. Mind you, I'm not arguing he's right as I have no idea, I'm arguing that he has the references, this being his bread and butter.
He is a perfect example of a little knowledge is a dangerous thing. He is wrong and he doesn't understand Apple's technical description of NeuralHash.
Where he goes wrong is that he tried to induce how NeuralHash works by looking at what he believes to be the end result of a NeuralHash and his knowledge of more regular hashing functions.
This leads him to believe
NeuralHash = photo AI + hash
He describes photo AI to be similar to the scanning the Photo app is already doing.
If he was right, the CSAM detection system could be used for everything people are worried about: finding people smoking pot, participating in protest, having (illegal) guns, posing in MAGA caps, having a non-heterosexual preferences etc.
Fortunately,
NeuralHash = 1) algorithm for generating multidimensional floating point descriptors + 2) convolutional neural network + 3) hyperplane locality sensitivity hashing
The goal of 1) isn't to find similar images, but to
A. find images which are the same (or derivates) to images in the NCMEC database
B. and at the same time be extremely bad at finding similar images
C. and be extremely bad at finding dissimilar images
These are competing goals. That's why this algorithm is "sent through" a neural network (2) to optimise for these three tasks by testing many variations of the algorithm (1) on millions of non-CSAM images.
It is because of B this system is so inefficient to be misused by police and governments in many circumstances.
(It's important to my argument that you understand the difference between "derivative" and "similar" images).