Primate Labs Debuts New Geekbench Suite for AI-Centric Workloads

Frantisekj · Aug 15, 2024

name99 said:
What he means is that the Apple Foundation Models for Language (the Apple code that handles all AI language requests, so not just Siri "understanding" but also rewriting text, summarizing, translating, etc) has an average bit size per weight of about 3.7b. We know this because Apple published a paper on the subject. The same is probably true for the Vision model (again there's a united Vision model that handles various different types of vision requests, and I suspect also handles Pose). We have a series of papers on various aspects of the Vision model, but they're a few years old, and I don't believe there's been a recent paper giving the model size since Apple started really pushing model quantization.

His point is that Apple provides a platform (model conversion tools, model compiler, and hardware) that all work together to support models that operate well when compressed far beyond what other platforms currently achieve (see my comment above about the 8bit accuracy). So if you're obsessed with tribalism it's "somewhat unfair" to see Apple being punished for their performance at 32b (which only happens on GPU, not NPU) and even 16b, when Apple is all-in on optimizing for 8b and less.
Another way the benchmark is somewhat sub-optimal is that it does not give energy results, but of course that's the entire point of everyone moving to dedicated NPU hardware.

You could maybe write an alternative benchmark that calls into Apple APIs for some of the tasks above (eg image classification, pose detection, translation, etc). The results might not be *directly* comparable with the benchmark (which uses standard open models, so that more-or-less the same test is being run on all platforms) but the result might be VERY interesting in terms of comparing Apple's "systemwide" performance, not just the HW but also code optimized to the hardware.
This is probably an interesting project for any student out there wanting to get started at the very low-end of AI (just calling some APIs correctly) while also making a name for themself...

Thanks you for deep explanation. I somehow understand but I was looking more for answer for dummies

like Single p. affect tasks like... half p. tasks like and quantized those. I know it is more complicate like that but hope it can be made more simple for rest of us.

Thanks

TechnoMonk · Aug 15, 2024

Frantisekj said:
Could you elaborate more or give link how each test is related to different workloads or tasks how those bits relate to it. I am not familiar with it.

From the link below. Apple intelligence runs at 4 but outperforms other models. Apple can tailor the hardware and software to work together. The link has various evaluations of model performances and parameters. Geek bench for CPU or raw GPU makes sense, but AI centric workloads are lot more than a number. Too many factors and variables.

To maintain model quality, we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.7 bits-per-weight — to achieve the same accuracy as the uncompressed models. More aggressively, the model can be compressed to 3.5 bits-per-weight without significant quality loss.

Introducing Apple’s On-Device and Server Foundation Models

At the 2024 Worldwide Developers Conference, we introduced Apple Intelligence, a personal intelligence system integrated deeply into iOS 18…

machinelearning.apple.com

TechnoMonk · Aug 15, 2024

Frantisekj said:
Thanks you for deep explanation. I somehow understand but I was looking more for answer for dummies like Single p. affect tasks like... half p. tasks like and quantized those. I know it is more complicate like that but hope it can be made more simple for rest of us. Thanks

I will try to explain in simple terms. Some X car manufacturer makes the engine but the software/model that powers the engine is generic. Engine may have more raw power but combined with software model it may not reflect in reality. Apple has a smaller engine but heavily optimized model to make it run faster or better on a smaller engine. Not sure exactly Apples to Apples(pun intended) but you get the point.

svish · Aug 16, 2024

Good to know about this.

TheAppleGuySL · Aug 16, 2024

Tensor: Finally, a benchmark just for me!

miyo360 · Aug 16, 2024

iPhone 11 Pro results, iOS 17.5.1

Background Test = CPU
Sing Precision Score = 1636
Half Precision Score = 2691
Quantized Score = 2232

Background Test = GPU
Sing Precision Score = 916
Half Precision Score = 1110
Quantized Score = 682

Background Test = Neural Engine
Sing Precision Score = 1146
Half Precision Score = 3439
Quantized Score = 1360

iMac The Knife · Aug 16, 2024

I don't use AI much at all, so there's no need for me to benchmark it. I'm sure this is nice for those that do need to benchmark AI though.

learjet · Aug 16, 2024

iPhone 15 Pro, latest iOS Version: Geekbench AI hangs when using the NPU backend, the benchmark never finishes. Anyone else?

Bustycat · Aug 16, 2024

My iPhone 15 Plus, which scores (Neural Engine) are higher than MacBook Air M2 that the article shows, cannot run Apple Intelligence.

name99 · Aug 16, 2024

Frantisekj said:
2024-08-15 iPad16,5
ARM Core ML Neural Engine 3330 18866 22117
2024-08-15 iPad16,3
ARM Core ML Neural Engine 4733 31908 40669

I havent see 18.1 results but here are 17.6.1. and 18.0b. Difference is big. Mainly in image manipulation, object detection and machine translation.

Your second set of numbers seems like it is with 18.1b2

You seem to be thinking there is some undifferentiated pool of "iOS18 betas" that are all the same. You will not understand the pattern if you keep ignoring these details. Certainly the GB browser does not do a good job of ensuring that you see all the relevant info upfront; in the past I've seen similar confusion as people mix up CPU, GPU, and ANE results because the GB browser makes it so easy to confuse what you are seeing.

It's also possible that GB6 is doing a terrible job of detecting which iOS18 beta is being used, and the only trustworthy way of tracking this is when people know that they specifically used 18.0b vs 18.1b vs 18.1b2, as in my screenshot above? Seems that way since they are not even noting iOS18 as a beta...

I admit this whole thing is currently very messy. In the past SOME ANE benchmark numbers (though not GB6's) have been all over the place, and we may be seeing the same pattern right now. Hopefully things will settle down to some sort of stability.

And this boost so far is *probably* mainly from better layer streaming (see how it also boosts the GPU/CPU side, ie the FP32 side) not yet from W8A8, since I don't believe GB6 AI has done what's necessary to exploit W8A8 (use coreMLtools 8.0 and run characterization profiling on all the layers).

name99 · Aug 16, 2024

Bustycat said:
View attachment 2406715My iPhone 15 Plus, which scores (Neural Engine) are higher than MacBook Air M2 that the article shows, cannot run Apple Intelligence.

Most likely a RAM issue? I believe that phone has 6GB RAM.
*MAY* change over the next year if Apple can figure out ways to satisfactorily stream in model weights from flash.

Frantisekj · Aug 16, 2024

Thanks for claridicatipn for All. I just shared data that are available. Seems true that GB detection of betas and even M4 is bad. You still see new iPads with code number not real name.

CarAnalogy · Aug 17, 2024

WarmWinterHat said:
Wish I could get measured like that at work. Do and say anything I want, great! It's all wrong and I'm killing patients, but I'd be fast!

Maybe not in the medical industry thank god, but that's pretty much what a lot of other industries are slipping toward.

Frantisekj · Aug 17, 2024

Some insight into topic from Gary:

splifingate · Aug 17, 2024

'23 Studio Max

NE:

Mac Studio (2023) - Geekbench

Benchmark results for a Mac Studio (2023) with an Apple M2 Max processor.

browser.geekbench.com

CPU:

Mac Studio (2023) - Geekbench

Benchmark results for a Mac Studio (2023) with an Apple M2 Max processor.

browser.geekbench.com

GPU:

Mac Studio (2023) - Geekbench

Benchmark results for a Mac Studio (2023) with an Apple M2 Max processor.

browser.geekbench.com

AlexSwitch · Aug 18, 2024

Bustycat said:
View attachment 2406715My iPhone 15 Plus, which scores (Neural Engine) are higher than MacBook Air M2 that the article shows, cannot run Apple Intelligence.

I've tried this new AI benchmark on my M3 MBA ( 16GB Ram and 512 GB storage ) but when I choose the neural engine for the Core ML backend, the test halts after few minutes. If I choose the GPU or CPU as backend there are no problems. Does anyone experienced this issue?

Bustycat · Aug 18, 2024

AlexSwitch said:
I've tried this new AI benchmark on my M3 MBA ( 16GB Ram and 512 GB storage ) but when I choose the neural engine for the Core ML backend, the test halts after few minutes. If I choose the GPU or CPU as backend there are no problems. Does anyone experienced this issue?

Let it run. My iPad (6th-generation) ran 3.5 hours on the CPU test.

scottrichardson · Aug 18, 2024

ML Benchmarks - Geekbench

Apple performing very well across all the benchmarks!

iF34R · Aug 21, 2024

name99 said:
Once again the OS version you're using has a MASSIVE effect.

The quality of the comments in this thread really tells you everything you need to know about the supposed "tech" internet:

99% clueless rants/complaints based on tribalism

1% informed comment that even tries to understand what the benchmarks are doing, what affects the results, and why specific results may not match what you expected.

Both devices were on the latest ios/ipad os 18 beta.

iF34R · Aug 21, 2024

Frantisekj said:
You can probably test GPU and NeuralEngine preformance. This look like GPU or maybe even CPU. M4 NeuralEngine gives combined score around 75000.

Yeah, mine is 75,574 on iPad, and 53,408 on the phone.

Frantisekj · Aug 22, 2024

And there is Primary labs interview if anyone is interested.

Primate Labs Debuts New Geekbench Suite for AI-Centric Workloads

macrumors 6502a

macrumors 68040

macrumors 68040

macrumors P6

macrumors member

macrumors newbie

macrumors 65816

macrumors regular

macrumors 65816

macrumors 68030

macrumors 68030

macrumors 6502a

macrumors 603

macrumors 6502a

Contributor

macrumors newbie

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors 65816

macrumors 6502a

Our Staff