If Google, Apple, and Facebook want to improve voice to text AI transcription, why not just have a simple option for a user to flag a poor transcription and (completely of their own volition) ‘submit for transcription review’. That’s all that’s really needed in my opinion.
Because fundamentally modern "AI" (machine learning) requires a ton of data. It doesn't work like a human where you just need to find a small mistake, it requires tens of thousands of mistakes to make it learn correctly.
https://xkcd.com/1838/