I'm planning to write a book (fiction, so there will be lots of dialog), and my results thus far with Apple's Dictation software have been mixed. And I don't like that it shuts off if you pause your speech for >= 30 seconds.
Anything that works better? I'm looking for something that is locally-installed, rather than cloud-based.
The gold standard used to be the Nuance product (currently called Dragon Professional). But that requires I drop $600 for the software, and either buy a PC or use it with Parallels (apparently this works, even though it's not supported for ARM: https://dashedyellowline.com/2024/01/18/dragon-on-apple-silicon-works/)
One thing I have noticed is that Dictation on my M1 Pro MBP running Tahoe works somewhat better than Dictation on my 2019 i9 iMac running Sequoia. The former is a bit more accurate, and better able to keep up with me when I speak rapidly. I don't know how much of that is better software vs. the faster processor vs. the difference in microphone*. [*On my iMac, I'm using the mic in my Anker C200 vidcam.]
EDIT 2:
I've since done extensive testing of dictation apps. I've divided them into two categories: Those I've run locally, and those I've run cloud-based (I say "run" rather than "are", since several are both).
I explored both because I use both a 2019 i9 iMac for my desktop work, and an M1 Pro MacBook Pro for my mobile work
Practically speaking, the iMac can only use cloud–based apps: While models from two of the two best–known packages that are available for local installation (OpenAI's Whisper, and NVIDIA's Parakeet) can be run on an Intel Mac, both transcribe so slowly that neither is practical—though Parakeet V3 (V2 can't run on an Intel Mac, but V3 can) is less egregiously slow than Whisper V3 Turbo.
Thus I focused the cloud-based testing on my iMac. The results should be computer–agnostic, since the dictation is being processed in the cloud. Indeed, I did test Spokenly's cloud-based implementation with both my iMac and M1 Pro MBP, and found nearly the same results when using the same model.
The one small difference I saw may be due to the microphone, since these programs can be mic-sensitive in subtle ways. [On my iMac, I used the Anker PowerConfC200 webcam, while on the MBP I used the BOYA CM-40 boom mic.]
For instance, one of my tests was to see if the program could correctly transcribe the plural possessive in the following sentence: "On many superhero teams, the heroes' costumes are each a different color." On one program, it gave hero's with the mic on my MBP (a mistake), but heroes' with the BOYA boom mic.
And the local testing was mostly done using my M1 MBP.
Overall, I found the cloud-based apps are superior to the locally-installed ones for both speed and capability. One striking difference is that many of the cloud-based apps are able to recognize spoken passages in fiction and thus surround them with quotation marks. By contrast, none of the locally-installed apps are able to do that. In addition, the cloud-based apps generally have much more capability to accept voice commands for formatting and punctuation than the locally-installed apps.
But if you have a much slower internet connection than me (mine is 940 Mbps up/down), and a better-performing computer (mine is only an M1 Pro), you might find the relative speed of local vs. cloud to be flipped from what I found. But that won't change the relative capabilities of the two categories.
The one feature I really wanted was real-time dictation, like one gets with Apple Dictation. Unfortunately, I was only able to find one app that does that: Talk Type (a cloud-based app). Unfortunately, it is not as capable as the others in its category.
Overall, the best-performing app for me seems to be Aqua Voice, so that's probably the one I'll be subscribing to. And it's privacy policy says that if you activate its privacy mode, none of the dictations are retained.
Finally, this served as a nice reminder that AI is fundamentally dumb: It's not smart enough to understand grammatical rules, since it's been trained on patterns, not what they mean. For that reason, if a certain compound adjective isn't in its training set as being hyphenated, it's not going to hyphenate it when it transcribes your spoken voice. I dicated much of the above using one of the programs, but then had to go back and manually add most of the hyphens.
These tables summarize my results. You'll probably need to click on them to made them big enough to read.
CLOUD-BASED PROCESSING (tested mostly with my 2019 i9 iMac)
Here's the internet performance on my iMac, tested using Speedtest's locally-installed app (more accurate than their web browser, which is unsuitable for high internet speeds):
LOCAL PROCESSING (tested mostly with my M1 Pro MacBook Pro)
[Don't give too much weight to the difference between the scores of 4 vs. 5 for accuracy, since that could be due to normal variation in how I spoke:]
Anything that works better? I'm looking for something that is locally-installed, rather than cloud-based.
The gold standard used to be the Nuance product (currently called Dragon Professional). But that requires I drop $600 for the software, and either buy a PC or use it with Parallels (apparently this works, even though it's not supported for ARM: https://dashedyellowline.com/2024/01/18/dragon-on-apple-silicon-works/)
One thing I have noticed is that Dictation on my M1 Pro MBP running Tahoe works somewhat better than Dictation on my 2019 i9 iMac running Sequoia. The former is a bit more accurate, and better able to keep up with me when I speak rapidly. I don't know how much of that is better software vs. the faster processor vs. the difference in microphone*. [*On my iMac, I'm using the mic in my Anker C200 vidcam.]
EDIT 2:
I've since done extensive testing of dictation apps. I've divided them into two categories: Those I've run locally, and those I've run cloud-based (I say "run" rather than "are", since several are both).
I explored both because I use both a 2019 i9 iMac for my desktop work, and an M1 Pro MacBook Pro for my mobile work
Practically speaking, the iMac can only use cloud–based apps: While models from two of the two best–known packages that are available for local installation (OpenAI's Whisper, and NVIDIA's Parakeet) can be run on an Intel Mac, both transcribe so slowly that neither is practical—though Parakeet V3 (V2 can't run on an Intel Mac, but V3 can) is less egregiously slow than Whisper V3 Turbo.
Thus I focused the cloud-based testing on my iMac. The results should be computer–agnostic, since the dictation is being processed in the cloud. Indeed, I did test Spokenly's cloud-based implementation with both my iMac and M1 Pro MBP, and found nearly the same results when using the same model.
The one small difference I saw may be due to the microphone, since these programs can be mic-sensitive in subtle ways. [On my iMac, I used the Anker PowerConfC200 webcam, while on the MBP I used the BOYA CM-40 boom mic.]
For instance, one of my tests was to see if the program could correctly transcribe the plural possessive in the following sentence: "On many superhero teams, the heroes' costumes are each a different color." On one program, it gave hero's with the mic on my MBP (a mistake), but heroes' with the BOYA boom mic.
And the local testing was mostly done using my M1 MBP.
Overall, I found the cloud-based apps are superior to the locally-installed ones for both speed and capability. One striking difference is that many of the cloud-based apps are able to recognize spoken passages in fiction and thus surround them with quotation marks. By contrast, none of the locally-installed apps are able to do that. In addition, the cloud-based apps generally have much more capability to accept voice commands for formatting and punctuation than the locally-installed apps.
But if you have a much slower internet connection than me (mine is 940 Mbps up/down), and a better-performing computer (mine is only an M1 Pro), you might find the relative speed of local vs. cloud to be flipped from what I found. But that won't change the relative capabilities of the two categories.
The one feature I really wanted was real-time dictation, like one gets with Apple Dictation. Unfortunately, I was only able to find one app that does that: Talk Type (a cloud-based app). Unfortunately, it is not as capable as the others in its category.
Overall, the best-performing app for me seems to be Aqua Voice, so that's probably the one I'll be subscribing to. And it's privacy policy says that if you activate its privacy mode, none of the dictations are retained.
Finally, this served as a nice reminder that AI is fundamentally dumb: It's not smart enough to understand grammatical rules, since it's been trained on patterns, not what they mean. For that reason, if a certain compound adjective isn't in its training set as being hyphenated, it's not going to hyphenate it when it transcribes your spoken voice. I dicated much of the above using one of the programs, but then had to go back and manually add most of the hyphens.
These tables summarize my results. You'll probably need to click on them to made them big enough to read.
CLOUD-BASED PROCESSING (tested mostly with my 2019 i9 iMac)
Here's the internet performance on my iMac, tested using Speedtest's locally-installed app (more accurate than their web browser, which is unsuitable for high internet speeds):
LOCAL PROCESSING (tested mostly with my M1 Pro MacBook Pro)
[Don't give too much weight to the difference between the scores of 4 vs. 5 for accuracy, since that could be due to normal variation in how I spoke:]
Last edited: