Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
What he means is that the Apple Foundation Models for Language (the Apple code that handles all AI language requests, so not just Siri "understanding" but also rewriting text, summarizing, translating, etc) has an average bit size per weight of about 3.7b. We know this because Apple published a paper on the subject. The same is probably true for the Vision model (again there's a united Vision model that handles various different types of vision requests, and I suspect also handles Pose). We have a series of papers on various aspects of the Vision model, but they're a few years old, and I don't believe there's been a recent paper giving the model size since Apple started really pushing model quantization.

His point is that Apple provides a platform (model conversion tools, model compiler, and hardware) that all work together to support models that operate well when compressed far beyond what other platforms currently achieve (see my comment above about the 8bit accuracy). So if you're obsessed with tribalism it's "somewhat unfair" to see Apple being punished for their performance at 32b (which only happens on GPU, not NPU) and even 16b, when Apple is all-in on optimizing for 8b and less.
Another way the benchmark is somewhat sub-optimal is that it does not give energy results, but of course that's the entire point of everyone moving to dedicated NPU hardware.

You could maybe write an alternative benchmark that calls into Apple APIs for some of the tasks above (eg image classification, pose detection, translation, etc). The results might not be *directly* comparable with the benchmark (which uses standard open models, so that more-or-less the same test is being run on all platforms) but the result might be VERY interesting in terms of comparing Apple's "systemwide" performance, not just the HW but also code optimized to the hardware.
This is probably an interesting project for any student out there wanting to get started at the very low-end of AI (just calling some APIs correctly) while also making a name for themself...
Thanks you for deep explanation. I somehow understand but I was looking more for answer for dummies :D like Single p. affect tasks like... half p. tasks like and quantized those. I know it is more complicate like that but hope it can be made more simple for rest of us. ;) Thanks
 
Could you elaborate more or give link how each test is related to different workloads or tasks how those bits relate to it. I am not familiar with it.
From the link below. Apple intelligence runs at 4 but outperforms other models. Apple can tailor the hardware and software to work together. The link has various evaluations of model performances and parameters. Geek bench for CPU or raw GPU makes sense, but AI centric workloads are lot more than a number. Too many factors and variables.
To maintain model quality, we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.7 bits-per-weight — to achieve the same accuracy as the uncompressed models. More aggressively, the model can be compressed to 3.5 bits-per-weight without significant quality loss.
 
Thanks you for deep explanation. I somehow understand but I was looking more for answer for dummies :D like Single p. affect tasks like... half p. tasks like and quantized those. I know it is more complicate like that but hope it can be made more simple for rest of us. ;) Thanks
I will try to explain in simple terms. Some X car manufacturer makes the engine but the software/model that powers the engine is generic. Engine may have more raw power but combined with software model it may not reflect in reality. Apple has a smaller engine but heavily optimized model to make it run faster or better on a smaller engine. Not sure exactly Apples to Apples(pun intended) but you get the point.
 
iPhone 11 Pro results, iOS 17.5.1

Background Test = CPU
Sing Precision Score = 1636
Half Precision Score = 2691
Quantized Score = 2232

Background Test = GPU
Sing Precision Score = 916
Half Precision Score = 1110
Quantized Score = 682

Background Test = Neural Engine
Sing Precision Score = 1146
Half Precision Score = 3439
Quantized Score = 1360
 
  • Like
Reactions: LockOn2B
I don't use AI much at all, so there's no need for me to benchmark it. I'm sure this is nice for those that do need to benchmark AI though.
 
iPhone 15 Pro, latest iOS Version: Geekbench AI hangs when using the NPU backend, the benchmark never finishes. Anyone else?
 
IMG_7889.png
My iPhone 15 Plus, which scores (Neural Engine) are higher than MacBook Air M2 that the article shows, cannot run Apple Intelligence.
 
2024-08-15iPad16,5
ARM
Core ML Neural Engine33301886622117
2024-08-15iPad16,3
ARM
Core ML Neural Engine47333190840669

I havent see 18.1 results but here are 17.6.1. and 18.0b. Difference is big. Mainly in image manipulation, object detection and machine translation.
Your second set of numbers seems like it is with 18.1b2
Image


You seem to be thinking there is some undifferentiated pool of "iOS18 betas" that are all the same. You will not understand the pattern if you keep ignoring these details. Certainly the GB browser does not do a good job of ensuring that you see all the relevant info upfront; in the past I've seen similar confusion as people mix up CPU, GPU, and ANE results because the GB browser makes it so easy to confuse what you are seeing.

It's also possible that GB6 is doing a terrible job of detecting which iOS18 beta is being used, and the only trustworthy way of tracking this is when people know that they specifically used 18.0b vs 18.1b vs 18.1b2, as in my screenshot above? Seems that way since they are not even noting iOS18 as a beta...

I admit this whole thing is currently very messy. In the past SOME ANE benchmark numbers (though not GB6's) have been all over the place, and we may be seeing the same pattern right now. Hopefully things will settle down to some sort of stability.

And this boost so far is *probably* mainly from better layer streaming (see how it also boosts the GPU/CPU side, ie the FP32 side) not yet from W8A8, since I don't believe GB6 AI has done what's necessary to exploit W8A8 (use coreMLtools 8.0 and run characterization profiling on all the layers).
 
Last edited:
  • Like
Reactions: EntropyQ3
Thanks for claridicatipn for All. I just shared data that are available. Seems true that GB detection of betas and even M4 is bad. You still see new iPads with code number not real name.
 
Wish I could get measured like that at work. Do and say anything I want, great! It's all wrong and I'm killing patients, but I'd be fast!

:rolleyes:

Maybe not in the medical industry thank god, but that's pretty much what a lot of other industries are slipping toward.
 
View attachment 2406715My iPhone 15 Plus, which scores (Neural Engine) are higher than MacBook Air M2 that the article shows, cannot run Apple Intelligence.
I've tried this new AI benchmark on my M3 MBA ( 16GB Ram and 512 GB storage ) but when I choose the neural engine for the Core ML backend, the test halts after few minutes. If I choose the GPU or CPU as backend there are no problems. Does anyone experienced this issue?
 
I've tried this new AI benchmark on my M3 MBA ( 16GB Ram and 512 GB storage ) but when I choose the neural engine for the Core ML backend, the test halts after few minutes. If I choose the GPU or CPU as backend there are no problems. Does anyone experienced this issue?
Let it run. My iPad (6th-generation) ran 3.5 hours on the CPU test.
 
  • Like
Reactions: splifingate
Once again the OS version you're using has a MASSIVE effect.

The quality of the comments in this thread really tells you everything you need to know about the supposed "tech" internet:

99% clueless rants/complaints based on tribalism

1% informed comment that even tries to understand what the benchmarks are doing, what affects the results, and why specific results may not match what you expected.
Both devices were on the latest ios/ipad os 18 beta.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.