I'll admit that you're more creative than I in contriving a what-if.
A few issues (according to their documentation, again, up to the reader to believe it or not) -- the process of revealing the >= 30 hits on the CSAM database relies on a handshake between Apple's servers and the vouchers from your phone -- i.e., China would also require access to Apple's proprietary server-side code, which I'm guessing Apple wouldn't be keen to fork over.
To your booby-trapped CSAM images argument, I see at least three possible issues:
- Assuming they are able to do this (see (2)), there is still the issue that you would need to have near-exact replicas of the images that they want to flag -- i.e., you could attend the same protest, take pictures of the same people (from perhaps a slightly different angle), and still, with very high likelihood, produce a different hash.
- Talking about an adversarial government who would have no problem generating new CSAM images does not immediately make the problem of training a GAN to do this easy. I would argue that they would have quite a hard time doing this, without many millions of novel CSAMs. They could likely easily make semantically meaningless images matching the hashes of image they want to flag, but creating novel CSAM having a particular hash is a much harder problem. Not impossible, but I gather very difficult without an absurd amount of data.
- Why the heck would China (or any other repressive country) choose this as the best spying vector? Aside from the absurd cost involved, it just about the least efficient way to get the job done. Your phone already semantically tags nearly everything on it --- it would be much easier to require Apple to just report whenever a user has content tagged with anything in <set of objectionable things>. If China has the ability to require Apple bend to its every demand, then there's no way they would choose the CSAM hashing vector for spying.