It's important to understand what these sorts of benchmarks do and don't tell us. I raise this point because I keep seeing confusion.
1. CoreAI is about packaging up a model you acquire from elsewhere and getting it to run on Apple Silicon. Just like CoreML.
It is NOT a "convert model to ANE" tool. Many people were hoping it would be such a tool (with CoreML staying in its existing role), but things are what they are.
So if your goal is "optimize a model for ANE", things remain as they were with CoreML: lots of spelunking through where each layer runs and figuring out why it chose not to run on ANE. CoreML *may* make this easier in that various limits (especially those that look artificial, eg various size limits) may have been raised.
2. This benchmark appears to show the GPU as (generally) much faster than ANE. This can be true *for non-Apple models*. Apple has published in many places anything from hints to outright definite statements about what runs optimally fast on ANE (for example ReLU is definitely preferred, as is an embedding vector ordering that tends to generate *long* clusters of zero activations). Many of these choices are very different from what's optimal on nVidia (eg nVidia very much prefers zeros spread evenly across weights and activations rather than Apple's long runs of zeros). These are not "better or worse", they are just different HW choices.
Bottom line is that
- Apple, building its models from scratch, gets much better performance out of ANE than do 3rd party models. This is not even, I think, because Apple uses secret functionality; it's just that taking a model optimized for nV HW choices and slapping it on ANE makes lousy use of ANE. This could be rectified somewhat if people bothered to understand how ANE really works, but that knowledge is still diffuse and most porters don't even seem aware of the resources available.
To do better, more sophisticated model modification is required (eg re-ordering embedding tables, with consequent reordering of all weights; or replacing activation functions with ReLU, which probably requires adding some LoRA layers to fix the resultant slight inaccuracies).
- As far as numbers go the single number summary line is that Apple's AFM3 (the new one in iOS27) runs at about 60..70 tps, and runs on ANE. There are a few more details here:
It runs at about half the speed of the PCC (cloud) model.
It's unclear if the model tested was the "baseline" AFM3 or the advanced (20B) AFM3.
It IS running on ANE even though there was initially some confusion about this because the various performance monitoring apps needed to be updated for OS27.
- I've mentioned before the analogy to codecs in the early days of QuickTime. QuickTime began life in 1991 supporting all codecs, and it wasn't clear to Apple (and anyone else!) how this would play out, if there would be a single dominant codec or a perpetual pool of 10 codecs used for different situations.
But by the mid nineties it was already clear that the extreme variety of choices was sub-optimal, that most people just wanted a "good enough" that was always there and always worked; AND (unlike a few years earlier), by 1997 Apple had access to such a codec in the form of Sorenson.
This is where we are now with AI models.
Sorenson was far from perfect, and far from state of the art. But it WAS good enough in terms of balancing everything relevant and feasible given the hardware of the time (including, as always, the installed base).
- Sorenson meant that the frontier was over. Crazy amateurs (like myself in 1990, getting MPEG-1 working on my Mac SE/30, or Sam in Australia getting PNG working) had a good run, but our time as amateurs had passed.
We'll continue to see people playing with hero ports of various models to Apple HW, but I suspect it's going to be less and less relevant to most people (unless Apple does really stupid things with guard rails *cough* Anthropic *cough*).
Most people will find Siri AI (and more precisely "the system as Apple ships it") just works for what they need, even if what they need is fairly specialized [OCR? Translation? Coding? File format conversion? all tbd, and probably all not as as good as they should be, but also subject to constant revision with each OS update].
Probably not if you are at the leading edge of OpenClaw deployment (while OS27 has a lot of agentic infrastructure, and some limited agentic functionality to get people used to the idea, it's very much not yet an agentic OS, let alone an "open-ended" agentic OS).
- But Sorenson also did NOT mean improvement was over. Even at the time Sorenson was clearly quality inferior to MPEG-1 (let alone h.263 done right), but it was fast enough on older HW, unlike those two. We'll continue to see local AFM improve, and we'll continue to see the routing between local AFM vs the two cloud models improve.
But the world that will matter for most people will be what Apple ships, not experiments on github. I say this not as a judgement, just as a statement of how these things play out. For those who have enjoyed being enthusiastic amateurs, I'd suggest the way to harness your talents going forward is no longer constant fiddling with porting new models, and more of either
+ unusual ways to exploit/optimize what Apple provides in the OS (ie clever new use cases, clever new wrapper apps)
+ exploring within the standard existing open models how they work, what they tell us about reasoning etc (eg clever model surgery, exploring looping models that reason in latent rather than token space, large concept models [perhaps with innovative tokenization or multi-tokenization schemes as an intermediate step], etc.