Primate Labs Debuts New Geekbench Suite for AI-Centric Workloads

MacRumors · Aug 15, 2024

Following extended testing under the "Geekbench ML" name, Primate Labs is officially launching its new benchmarking suite optimized for AI-centric workloads under the name Geekbench AI. The tool seeks to measure hardware performance under a variety of workloads focused on machine learning, deep learning, and other AI-centric tasks.

Geekbench AI 1.0 examines some of the unique workloads associated with AI tasks and seeks to encompass the variety of hardware designs employed by vendors to tackle these tasks, delivering a three-score summary as part of its benchmarking results to reflect a range of precision levels: single-precision, half-precision, and quantized data.

In addition to these performance scores, Geekbench AI also includes an accuracy measurement on a per-test basis, allowing developers to improve efficiency and reliability while assessing the benefits and drawbacks of various engineering approaches.

Finally, the 1.0 release of Geekbench AI includes support for new frameworks and more extensive data sets that more closely reflect real-world inputs, improving the accuracy evaluations in the suite.

Geekbench AI is available for an array of platforms, including iOS, macOS, Windows, Android, and Linux.

Article Link: Primate Labs Debuts New Geekbench Suite for AI-Centric Workloads

madmin · Aug 15, 2024

Does it measure for stupidity and hallucinations ?

Spidder · Aug 15, 2024

madmin said:
Does it measure for stupidity and hallucinations ?

It measures the performance of AI workloads, not the quality of LLM answers

coffeemilktea · Aug 15, 2024

Awesome, time to see how powerful my hardware is at generating recipes for street tacos. 🌮

MrRom92 · Aug 15, 2024

What other iOS benchmark apps are there? I am only familiar with Antutu Benchmark

SW3029 · Aug 15, 2024

But like, what about VoodooPad?

WarmWinterHat · Aug 15, 2024

Spidder said:
It measures the performance of AI workloads, not the quality of LLM answers

Wish I could get measured like that at work. Do and say anything I want, great! It's all wrong and I'm killing patients, but I'd be fast!

CrysisDeu · Aug 15, 2024

Just a rename of geekbench ML, nothing new here. Clicking on the open button on the app store opens the ML app

Sanmi · Aug 15, 2024

I’m waiting for AI noodles and AI water

contacos · Aug 15, 2024

Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?

TechnoMonk · Aug 15, 2024

contacos said:
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?

Apple intelligence is optimized for qunatized 3.5-4 bit. It may do worse with other precisions on iPhone 15. The scores don’t mean much unless the models are optimized for certain precision.

9081094 · Aug 15, 2024

contacos said:
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?

So the fold is way ahead of the iPhone 15 if I read right?

contacos · Aug 15, 2024

HJM.NL said:
So the fold is way ahead of the iPhone 15 if I read right?

The regular iPhone 15 (Plus) which is almost a year old but honestly, I am not fully understanding how the score is determined yet either. I just see that the number is higher = must be better. Haha

9081094 · Aug 15, 2024

contacos said:
The regular iPhone 15 (Plus) which is almost a year old but honestly, I am not fully understanding how the score is determined yet either. I just see that the number is higher = must be better. Haha

Yep… let’s hope the iPhone 16 does better 🙏🏻

iF34R · Aug 15, 2024

1TB iPad Pro M4 didn't beat my my iPhone 15 Pro Max 256GB to death.

iPad M4
4701
7840
6814

iPhone 15 Pro Max
4046
6953
6065

SMH4KIDIOT · Aug 15, 2024

contacos said:
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?

It is also using the CPU backend in your benchmark not the AI accelerators. Switch to the neural engine to see how that performs.

name99 · Aug 15, 2024

The most interesting TECHNICAL element is the massive diversity in the 8-bit results.
The word on the internet is that the 8-bit performance is useless because it's so inaccurate, but that's not quite true.
There are two VERY DIFFERENT 8-bit camps.
Apple 8-bit is generally high 90's accuracy, worst I saw was 93%.
OpenVino (Intel) is more like mid 90s, worst I saw was 80%.

But then the ONNX (Microsoft, including MS ARM) and TensorFlow (both variants) are a disaster, accuracies at 40%, 60%, 70%

Which is kinda interesting...
First from a technical point of view, showing who's been concentrating their R&D where.
Secondly from a product point of view, in that it's probably reasonable to compare Apple 8bit model performance to Android/MS 16bit model performance.

Frantisekj · Aug 15, 2024

WarmWinterHat said:
Wish I could get measured like that at work. Do and say anything I want, great! It's all wrong and I'm killing patients, but I'd be fast!

It measure accuracy as well otherwise it would be joke.

name99 · Aug 15, 2024

CrysisDeu said:
Just a rename of geekbench ML, nothing new here. Clicking on the open button on the app store opens the ML app

Not quite true.

Different emphasis, and the error reporting is new and VERY interesting.

Frantisekj · Aug 15, 2024

iF34R said:
1TB iPad Pro M4 didn't beat my my iPhone 15 Pro Max 256GB to death.

iPad M4
4701
7840
6814

iPhone 15 Pro Max
4046
6953
6065

You can probably test GPU and NeuralEngine preformance. This look like GPU or maybe even CPU. M4 NeuralEngine gives combined score around 75000.

name99 · Aug 15, 2024

contacos said:
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?

iOS 18.1b provides ML performance boosts that substantially affect some of benchmarks.
You're testing with 18.0b

The same may possibly (I don't know that space) be true for your Samsung device, that some combination of Samsung drivers and Android software are substantially out of day relative to the current state of the art.

Frantisekj · Aug 15, 2024

TechnoMonk said:
Apple intelligence is optimized for qunatized 3.5-4 bit. It may do worse with other precisions on iPhone 15. The scores don’t mean much unless the models are optimized for certain precision.

Could you elaborate more or give link how each test is related to different workloads or tasks how those bits relate to it. I am not familiar with it.

name99 · Aug 15, 2024

iF34R said:
1TB iPad Pro M4 didn't beat my my iPhone 15 Pro Max 256GB to death.

iPad M4
4701
7840
6814

iPhone 15 Pro Max
4046
6953
6065

Once again the OS version you're using has a MASSIVE effect.

The quality of the comments in this thread really tells you everything you need to know about the supposed "tech" internet:

99% clueless rants/complaints based on tribalism

1% informed comment that even tries to understand what the benchmarks are doing, what affects the results, and why specific results may not match what you expected.

name99 · Aug 15, 2024

Frantisekj said:
Could you elaborate more or give link how each test is related to different workloads or tasks how those bits relate to it. I am not familiar with it.

What he means is that the Apple Foundation Models for Language (the Apple code that handles all AI language requests, so not just Siri "understanding" but also rewriting text, summarizing, translating, etc) has an average bit size per weight of about 3.7b. We know this because Apple published a paper on the subject. The same is probably true for the Vision model (again there's a united Vision model that handles various different types of vision requests, and I suspect also handles Pose). We have a series of papers on various aspects of the Vision model, but they're a few years old, and I don't believe there's been a recent paper giving the model size since Apple started really pushing model quantization.

His point is that Apple provides a platform (model conversion tools, model compiler, and hardware) that all work together to support models that operate well when compressed far beyond what other platforms currently achieve (see my comment above about the 8bit accuracy). So if you're obsessed with tribalism it's "somewhat unfair" to see Apple being punished for their performance at 32b (which only happens on GPU, not NPU) and even 16b, when Apple is all-in on optimizing for 8b and less.
Another way the benchmark is somewhat sub-optimal is that it does not give energy results, but of course that's the entire point of everyone moving to dedicated NPU hardware.

You could maybe write an alternative benchmark that calls into Apple APIs for some of the tasks above (eg image classification, pose detection, translation, etc). The results might not be *directly* comparable with the benchmark (which uses standard open models, so that more-or-less the same test is being run on all platforms) but the result might be VERY interesting in terms of comparing Apple's "systemwide" performance, not just the HW but also code optimized to the hardware.
This is probably an interesting project for any student out there wanting to get started at the very low-end of AI (just calling some APIs correctly) while also making a name for themself...

Frantisekj · Aug 15, 2024

name99 said:
iOS 18.1b provides ML performance boosts that substantially affect some of benchmarks.
You're testing with 18.0b

The same may possibly (I don't know that space) be true for your Samsung device, that some combination of Samsung drivers and Android software are substantially out of day relative to the current state of the art.

2024-08-15	iPad16,5 ARM	Core ML Neural Engine	3330	18866	22117
2024-08-15	iPad16,3 ARM	Core ML Neural Engine	4733	31908	40669

I havent see 18.1 results but here are 17.6.1. and 18.0b. Difference is big. Mainly in image manipulation, object detection and machine translation.

Primate Labs Debuts New Geekbench Suite for AI-Centric Workloads

macrumors bot

macrumors 6502a

Cancelled

macrumors 68000

macrumors 65816

macrumors 6502a

macrumors 601

macrumors 65816

macrumors member

macrumors 603

Attachments

macrumors 68040

Cancelled

macrumors 603

Cancelled

macrumors 65816

macrumors regular

macrumors 68030

macrumors 6502a

macrumors 68030

macrumors 6502a

macrumors 68030

macrumors 6502a

macrumors 68030

macrumors 68030

macrumors 6502a

Our Staff