Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
68,160
38,935


Apple's new speech-to-text transcription APIs in iOS 26 and macOS Tahoe are delivering dramatically faster speeds compared to rival tools, including OpenAI's Whisper, based on beta testing conducted by MacStories' John Voorhees.

apple_record_transcribe_phone_calls.jpeg
Call recording and transcription in iOS 18.1

Apple uses its own native speech frameworks to power live transcription features in apps like Notes and Voice Memos, as well as phone call transcription in iOS 18.1. To improve efficiency in iOS 26 and macOS Tahoe, Apple has introduced a new SpeechAnalyzer class and SpeechTranscriber module that deal with similar requests.

According to Voorhees, the new models processed a 34-minute, 7GB video file in just 45 seconds using a command line tool called Yap (developed by Voorhees' son, Finn). That's a full 55% faster than MacWhisper's Large V3 Turbo model, which took 1 minute and 41 seconds for the same file.

Other Whisper-based tools performed even slower, with VidCap taking 1:55 and MacWhisper's Large V2 model requiring 3:55 to complete the same transcription task. Voorhees also reported no noticeable difference in transcription quality across models.

The speed advantage comes from Apple's on-device processing approach, which avoids the network overhead that typically slows cloud-based transcription services.

While the time difference might seem modest for individual files, Voorhees notes that the performance gain increases exponentially when processing multiple videos or longer content. For anyone generating subtitles or transcribing lectures regularly, the efficiency boost could save them hours.

The Speech framework components are available across iPhone, iPad, Mac, and Vision Pro platforms in the current beta releases. Voorhees expects Apple's transcription technology to eventually replace Whisper as the go-to solution for Mac transcription apps.

Article Link: Apple's New Transcription APIs Blow Past Whisper in Speed Tests
 
Last edited:
  • Haha
Reactions: antiprotest
Impressive, if it is accurate. What the story doesn't mention is how accurate each of those transcriptions was? Were they all identical? Did one or other have more mistakes? What is the accuracy percentage for each one, and how badly wrong were those mistakes?

I'm not trying to defend ChatGPT, just the speed is a single metric, which isn't very useful if the results are garbage. If the Apple one is faster and more accurate, that is incredible, faster and as accurate, impressive, faster but full of errors, not really that useful.

Hopefully it is the first one: it is faster and more accurate.
 
Not mentioning accuracy at all implies it's not. Lots of models are faster than O3, but they're not better.

This is just silly getting sillier. Write something meaningful.

Whisper works in real time. Anything faster is irrelevant for iOS.

And saying it's because network overhead? When you can run OpenAI's whisper locally?....... mhm.

This is a blatant advertisement just regurgitating apples marketing bullets.
 
Impressive, if it is accurate. What the story doesn't mention is how accurate each of those transcriptions was? Were they all identical? Did one or other have more mistakes? What is the accuracy percentage for each one, and how badly wrong were those mistakes?

I'm not trying to defend ChatGPT, just the speed is a single metric, which isn't very useful if the results are garbage. If the Apple one is faster and more accurate, that is incredible, faster and as accurate, impressive, faster but full of errors, not really that useful.

Hopefully it is the first one: it is faster and more accurate.
Nothing scientific, but in the MacStories post: "What stood out above all else was Yap’s speed. By harnessing SpeechAnalyzer and SpeechTranscriber on-device, the command line tool tore through the 7GB video file a full 55% faster than MacWhisper’s Large V3 Turbo model, with no noticeable difference in transcription quality."

It would be good to see more formal comparisons with data you suggested. Also, it would be good to know what computer John was using for the test.
 
For transcription and similar applications, accuracy is the king. If a 70GB file can be processed in 2min but nothing is legible, then it means nothing. Other people also points that out. Stop chasing after speed and focus on improving accuracy first. Of course, all of these must be done locally.
 
  • Like
Reactions: 0bit and Big_D
Yeah sure, but this doesn't mean anything. Whisper can understand an Irishman singing a traditional folklore song. Can Apple's model do that? And to what degree of success?

Accuracy is incredibly important. They should define criteria other than speed, and measure against those. Only then 'speed' becomes a useful parameter to test.
 
The original article has a math error, which MacRumors has repeated. The “55% faster” claim is wrong. It’s actually 2.24 times as fast, or 124% faster. If car A drives a mile in 45 seconds and car B drives the same mile in 101 seconds, car A is 2.24 times as fast as car B.
 
  • Like
Reactions: the future
The original article has a math error, which MacRumors has repeated. The “55% faster” claim is wrong. It’s actually 2.24 times as fast, or 124% faster. If car A drives a mile in 45 seconds and car B drives the same mile in 101 seconds, car A is 2.24 times as fast as car B.
It did it in 55% of the time it took Whisper time is the accurate description
 
MacWhisper has faster models than “large” though. Small (English) just transcribed 1hr07min of audio in 1min24sec on my M3 Max MBP, and it’s quite accurate too.

Without running tests on multiple models we don’t have the full story.
 
Wait that’s amazing. I love voice to text and I’ve been using it so much ever since I got voice type but it seems to break with every second update. I’ll be happily amazed if Apple actually managed to level up their awful voice to text technology that significantly.
 
Can't wait

I swear the voice dictation feature has gotten less accurate and slower over the years.

At worst, it simply hasn't gotten better, I don't know ... I'm just ready for that to "feel" like 2025.
 
  • Like
Reactions: nxt3
Speech to text on ios 18 on an iphone 14 pro max is embarrassingly terrible. I don't care how fast it is if it doesn't work well.
 
It is said that Whisper is way slower than this new API, however, they are comparing it to cloud based whisper versions, right?

What about comparing to another on-device transcription tool based on Whisper, such as MacWhisper?
 
  • Like
Reactions: turbineseaplane
If this same APIs are now also used in the dictation feature, that will mean that I’ll probably use it way often. It already works quite well on A15/M2 devices with an older OS, so maybe this means it will improve even further.
 
If it can do the work well and accurately then is a big win. Expecting it to further improve in the future.
 
  • Like
Reactions: mganu
It's great, that Apple is working on improving transcription on iOS.

For many usecases though, it would be much more useful if there was a way to plug in other models like Whisper into the OS.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.