I don't want this type of language understanding AI to get smarter as it will be used for 1984 style surveillance.
Hate to break it to you but the security services have been using voice dictation and transcription technology when doing lawful interception for years and years now. It's at the point where almost any defence-orientated company or government department with a big enough budget can buy off the shelf intercept kit that does some level of voice recognition using modern AI (god I hate that buzzword)/ML based APIs.
Before I changed career paths from such a company (Ultra Electronics here in the UK) to a "friendlier" one (SUSE Linux) I worked on an implementation project installing such hardware and software into government data centres in Muscat, Oman. None of it was to do with "western" security services (as all of that would be covered by the official secrets act anyway and I wouldn't be allowed to post about it) but done as a private endeavor between a british company and the private security firm handling security at the new airport they were building in Muscat.
The system had the capability to listen in actively or passively to any telephone (or fax as they're still used a lot in airports) on the airport phone network and as well as log and record all calls (or the passive background monitoring of landlines) it would also run ML based transcription on those calls and turn them into readable/processable text based logs. It supported 12 languages at the time (including Arabic) and was designed to hunt for keywords or phrases that would indicate to a security company that they should perhaps listen into the call in more detail/accuracy than the dictation software could.
In a way I guess that helps improve privacy though as any calls the computer could understand relatively well (>92% of call content was the threshold. During our testing the system turned out to be able to get >92% of call content on about 80% of all calls/background noise monitoring) wouldn't get listened to or read by a human unless they said some words or phrases that triggered an alert/flag for further investigation.
On the flipside if more than 92% of a call was transcribed and translated successfully; after ~1 year the audio recording will be purged (storage is expensive) and only leave the text file transcript remaining as evidence (which was stored indefinitely once indexed and compressed).
That means someone could potentially get convicted later down the line based on a computer dictated & translated version of a call where there is no audio left only a stored text file.
I know by US standards that would be considered very "thin" evidence to go on alone but out in the Middle Eastern countries their "due process" isn't as developed as in The West and it means people could be deported, go to prison or be executed for a crime based on a simple txt file (depending on their race, gender and citizenship as yes; their legal system is still based a lot on your own background rather than the crime you've committed).
Having said that both the digital and physical security on such a system and it's components was relatively high but there's always a risk of some interference there.
Another great feature of the system (which is very common out in the Middle East) is the installation of the government's own self-signed root CA's into your smartphone/laptop (which most people happily consent to and install) - They'll shove all your traffic down a proxy server and having their root CAs on there means they can spoof the SSL to websites meaning they in theory can intercept the majority of https transmissions and users are none the wiser. It is of course a lot more complex than this but that's the basic principle of it.
This ties in with similar systems I describe above from the audio perspective to help build an accurate picture of an individual user.
At the end of the day in Muscat they would log (for example) WhatsApp, Facebook Messenger, Telegram, Skype etc calls and messages in the exact same way and through the exact same AI/ML processing systems and store the data in a similar way.
They had other companies providing them with ways to root Android phones as well or install "tourist" apps visitors would use that gave the security services even greater access to the individual's communications/habits etc. It's a huge huge industry providing lawful interception (actual technical term there!) to countries in the Middle East and Asia. They don't have the skills in house to do it like we do in the UK with GCHQ so they outsource it.
Anyways I'm going off topic! - My point is that if a relatively small Middle Eastern government can easily purchase all that equipment "off the shelf" from a UK private defence company and have them install, monitor and maintain that as part of a service contract for a few million quid a year then just imagine what our own security services have the capability to do.
I personally think what we got from Snowdon was only a small percentage of what their full capabilities are.
If a foreign country can buy AI/ML speech recognition and translation technology off the shelf and use it in that way with a relatively high accuracy rate then I can guarantee you that your own government in the US/US considers AI "smart enough" already to do that and will do the same.
It's not a case of "it will be used for 1984 style surveillance", but "it's been used for 1984 style surveillance" in some situations and areas for a little while now.
(note; I make a point here to not discuss opinions on the topic and stick to technical facts. I'm not someone wearing a 'tin hat' I was just doing my day job for a few years working out there and in other places. Personally I'm not paranoid about any of this and have no problem with the Omani government doing that to my phone or communications as at the end of the day it's a very different environment to the one I live in now in the UK and in many ways the Middle Eastern countries are a lot safer from terrorism/mass acts of violence compared to The West and this is one of many reasons as to why. I have no problem with the Omani government listening into my calls and Telegram messages to help keep all people residing or visiting their country safe. It's their country, their rules and their culture. Ours over here is different and as such I don't take a personal opinion as to whether mass lawful interception (or mass surveillance) is a good or bad thing in The West. I imagine most Westerners feel they are entitled to both privacy and the ability to not get blown up at an airport so it's a situation no one will ever win...)