Apple's New Transcription APIs Blow Past Whisper in Speed Tests

MacRumors · Jun 18, 2025

Apple's new speech-to-text transcription APIs in iOS 26 and macOS Tahoe are delivering dramatically faster speeds compared to rival tools, including OpenAI's Whisper, based on beta testing conducted by MacStories' John Voorhees.

Call recording and transcription in iOS 18.1

Apple uses its own native speech frameworks to power live transcription features in apps like Notes and Voice Memos, as well as phone call transcription in iOS 18.1. To improve efficiency in iOS 26 and macOS Tahoe, Apple has introduced a new SpeechAnalyzer class and SpeechTranscriber module that deal with similar requests.

According to Voorhees, the new models processed a 34-minute, 7GB video file in just 45 seconds using a command line tool called Yap (developed by Voorhees' son, Finn). That's a full 55% faster than MacWhisper's Large V3 Turbo model, which took 1 minute and 41 seconds for the same file.

Other Whisper-based tools performed even slower, with VidCap taking 1:55 and MacWhisper's Large V2 model requiring 3:55 to complete the same transcription task. Voorhees also reported no noticeable difference in transcription quality across models.

The speed advantage comes from Apple's on-device processing approach, which avoids the network overhead that typically slows cloud-based transcription services.

While the time difference might seem modest for individual files, Voorhees notes that the performance gain increases exponentially when processing multiple videos or longer content. For anyone generating subtitles or transcribing lectures regularly, the efficiency boost could save them hours.

The Speech framework components are available across iPhone, iPad, Mac, and Vision Pro platforms in the current beta releases. Voorhees expects Apple's transcription technology to eventually replace Whisper as the go-to solution for Mac transcription apps.

Article Link: Apple's New Transcription APIs Blow Past Whisper in Speed Tests

Big_D · Jun 18, 2025

Impressive, if it is accurate. What the story doesn't mention is how accurate each of those transcriptions was? Were they all identical? Did one or other have more mistakes? What is the accuracy percentage for each one, and how badly wrong were those mistakes?

I'm not trying to defend ChatGPT, just the speed is a single metric, which isn't very useful if the results are garbage. If the Apple one is faster and more accurate, that is incredible, faster and as accurate, impressive, faster but full of errors, not really that useful.

Hopefully it is the first one: it is faster and more accurate.

jmonster · Jun 18, 2025

Not mentioning accuracy at all implies it's not. Lots of models are faster than O3, but they're not better.

This is just silly getting sillier. Write something meaningful.

Whisper works in real time. Anything faster is irrelevant for iOS.

And saying it's because network overhead? When you can run OpenAI's whisper locally?....... mhm.

This is a blatant advertisement just regurgitating apples marketing bullets.

klasma · Jun 18, 2025

Speech-to-text is a good use case for on-device processing, but yes, accuracy is an important question, not to mention (multi-)language support.

neuropsychguy · Jun 18, 2025

Big_D said:
Impressive, if it is accurate. What the story doesn't mention is how accurate each of those transcriptions was? Were they all identical? Did one or other have more mistakes? What is the accuracy percentage for each one, and how badly wrong were those mistakes?

I'm not trying to defend ChatGPT, just the speed is a single metric, which isn't very useful if the results are garbage. If the Apple one is faster and more accurate, that is incredible, faster and as accurate, impressive, faster but full of errors, not really that useful.

Hopefully it is the first one: it is faster and more accurate.

Nothing scientific, but in the MacStories post: "What stood out above all else was Yap’s speed. By harnessing SpeechAnalyzer and SpeechTranscriber on-device, the command line tool tore through the 7GB video file a full 55% faster than MacWhisper’s Large V3 Turbo model, with no noticeable difference in transcription quality."

It would be good to see more formal comparisons with data you suggested. Also, it would be good to know what computer John was using for the test.

Shirasaki · Jun 18, 2025

For transcription and similar applications, accuracy is the king. If a 70GB file can be processed in 2min but nothing is legible, then it means nothing. Other people also points that out. Stop chasing after speed and focus on improving accuracy first. Of course, all of these must be done locally.

Big_D · Jun 18, 2025

Big_D said:
Impressive, if it is accurate.

OK, I read the original article, they all had similar problems with the podcast name, AppStories, writing it as two words instead of CamelCasing it, which is acceptable, and they all had similar problems with people's names. But the Apple tools weren't any less accurate, despite being much faster.

d4cloo · Jun 18, 2025

Yeah sure, but this doesn't mean anything. Whisper can understand an Irishman singing a traditional folklore song. Can Apple's model do that? And to what degree of success?

Accuracy is incredibly important. They should define criteria other than speed, and measure against those. Only then 'speed' becomes a useful parameter to test.

Muero · Jun 18, 2025

The original article has a math error, which MacRumors has repeated. The “55% faster” claim is wrong. It’s actually 2.24 times as fast, or 124% faster. If car A drives a mile in 45 seconds and car B drives the same mile in 101 seconds, car A is 2.24 times as fast as car B.

bollman · Jun 18, 2025

Muero said:
The original article has a math error, which MacRumors has repeated. The “55% faster” claim is wrong. It’s actually 2.24 times as fast, or 124% faster. If car A drives a mile in 45 seconds and car B drives the same mile in 101 seconds, car A is 2.24 times as fast as car B.

It did it in 55% of the time it took Whisper time is the accurate description

funwithstuff · Jun 18, 2025

MacWhisper has faster models than “large” though. Small (English) just transcribed 1hr07min of audio in 1min24sec on my M3 Max MBP, and it’s quite accurate too.

Without running tests on multiple models we don’t have the full story.

Basic75 · Jun 18, 2025

MacRumors said:
While the time difference might seem modest for individual files, Voorhees notes that the performance gain increases exponentially when processing multiple videos or longer content.

That's not how it works. Recommend maths lesson.

Rradcircless · Jun 18, 2025

Wait that’s amazing. I love voice to text and I’ve been using it so much ever since I got voice type but it seems to break with every second update. I’ll be happily amazed if Apple actually managed to level up their awful voice to text technology that significantly.

Saturn007 · Jun 18, 2025

“According to Voorhees, the new models processed a 34-minute, 7GB video file in just 45 seconds using a command line tool called Yap(developed by Voorhees' son, Finn).”

In other words, take this report with a HUGE grain of salt! 🧂

nxt3 · Jun 18, 2025

Speed is but one metric.

Whisper is scary accurate. It will punctuate perfectly even with pauses and “um” and “uhhhs” thrown in there.

I can’t say the same about voice dictation in iOS.

chfilm · Jun 18, 2025

Hopefully this is a sign of the good ai stuff coming from Apple after a bit of a longer wait

turbineseaplane · Jun 18, 2025

Can't wait

I swear the voice dictation feature has gotten less accurate and slower over the years.

At worst, it simply hasn't gotten better, I don't know ... I'm just ready for that to "feel" like 2025.

callison · Jun 18, 2025

As someone who works in and covers life science research, even if it takes more time, give me the more accurate model that can understand scientific speech.

originalmagneto · Jun 18, 2025

It doesn’t work with the languages that all the other AI models support…a big pass 😑

mrmod · Jun 18, 2025

Speech to text on ios 18 on an iphone 14 pro max is embarrassingly terrible. I don't care how fast it is if it doesn't work well.

mrr · Jun 18, 2025

Speed is not the issue, accuracy is.

Populus · Jun 18, 2025

It is said that Whisper is way slower than this new API, however, they are comparing it to cloud based whisper versions, right?

What about comparing to another on-device transcription tool based on Whisper, such as MacWhisper?

Populus · Jun 18, 2025

If this same APIs are now also used in the dictation feature, that will mean that I’ll probably use it way often. It already works quite well on A15/M2 devices with an older OS, so maybe this means it will improve even further.

svish · Jun 18, 2025

If it can do the work well and accurately then is a big win. Expecting it to further improve in the future.

MilaM · Jun 18, 2025

It's great, that Apple is working on improving transcription on iOS.

For many usecases though, it would be much more useful if there was a way to plug in other models like Whisper into the OS.

Apple's New Transcription APIs Blow Past Whisper in Speed Tests

macrumors bot

macrumors 6502

macrumors regular

macrumors G4

macrumors 68040

macrumors P6

macrumors 6502

macrumors regular

macrumors member

macrumors 6502a

macrumors regular

macrumors 68020

macrumors 6502

macrumors 68000

macrumors 6502

macrumors 68040

Contributor

macrumors member

macrumors regular

macrumors member

macrumors 65816

macrumors 604

macrumors 604

macrumors P6

macrumors 65816

Our Staff