Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
67,850
38,506


Following extended testing under the "Geekbench ML" name, Primate Labs is officially launching its new benchmarking suite optimized for AI-centric workloads under the name Geekbench AI. The tool seeks to measure hardware performance under a variety of workloads focused on machine learning, deep learning, and other AI-centric tasks.

geekbench-ai.jpg

Geekbench AI 1.0 examines some of the unique workloads associated with AI tasks and seeks to encompass the variety of hardware designs employed by vendors to tackle these tasks, delivering a three-score summary as part of its benchmarking results to reflect a range of precision levels: single-precision, half-precision, and quantized data.

geekbench-ai-iphone.jpg

In addition to these performance scores, Geekbench AI also includes an accuracy measurement on a per-test basis, allowing developers to improve efficiency and reliability while assessing the benefits and drawbacks of various engineering approaches.

geekbench-ai-mac.jpg

Finally, the 1.0 release of Geekbench AI includes support for new frameworks and more extensive data sets that more closely reflect real-world inputs, improving the accuracy evaluations in the suite.

Geekbench AI is available for an array of platforms, including iOS, macOS, Windows, Android, and Linux.

Article Link: Primate Labs Debuts New Geekbench Suite for AI-Centric Workloads
 
  • Like
Reactions: Michaelgtrusa
Awesome, time to see how powerful my hardware is at generating recipes for street tacos. 🌮
 
What other iOS benchmark apps are there? I am only familiar with Antutu Benchmark
 
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?
 

Attachments

  • Screenshot_20240815_202718_Geekbench AI.jpg
    Screenshot_20240815_202718_Geekbench AI.jpg
    160.8 KB · Views: 95
  • IMG_3003.png
    IMG_3003.png
    196.3 KB · Views: 91
Last edited:
  • Like
Reactions: Kazgarth
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?
Apple intelligence is optimized for qunatized 3.5-4 bit. It may do worse with other precisions on iPhone 15. The scores don’t mean much unless the models are optimized for certain precision.
 
  • Like
Reactions: SFjohn and LockOn2B
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?
So the fold is way ahead of the iPhone 15 if I read right?
 
  • Like
Reactions: Kazgarth
The regular iPhone 15 (Plus) which is almost a year old but honestly, I am not fully understanding how the score is determined yet either. I just see that the number is higher = must be better. Haha
Yep… let’s hope the iPhone 16 does better 🙏🏻
 
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?
It is also using the CPU backend in your benchmark not the AI accelerators. Switch to the neural engine to see how that performs.
 
  • Like
Reactions: SFjohn
The most interesting TECHNICAL element is the massive diversity in the 8-bit results.
The word on the internet is that the 8-bit performance is useless because it's so inaccurate, but that's not quite true.
There are two VERY DIFFERENT 8-bit camps.
Apple 8-bit is generally high 90's accuracy, worst I saw was 93%.
OpenVino (Intel) is more like mid 90s, worst I saw was 80%.

But then the ONNX (Microsoft, including MS ARM) and TensorFlow (both variants) are a disaster, accuracies at 40%, 60%, 70%

Which is kinda interesting...
First from a technical point of view, showing who's been concentrating their R&D where.
Secondly from a product point of view, in that it's probably reasonable to compare Apple 8bit model performance to Android/MS 16bit model performance.
 
  • Like
Reactions: Edsel
Are those results in these photos in the article real cuz all i got on my new Galaxy Fold 6 with all its AI features is this 😅

Edit: ok my 15 Plus does much worse besides in the Half Precision Score for some reason? At least Apple does not seem to be lying about its capabilities regarding Apple Intelligence support guess?
iOS 18.1b provides ML performance boosts that substantially affect some of benchmarks.
You're testing with 18.0b

The same may possibly (I don't know that space) be true for your Samsung device, that some combination of Samsung drivers and Android software are substantially out of day relative to the current state of the art.
 
Apple intelligence is optimized for qunatized 3.5-4 bit. It may do worse with other precisions on iPhone 15. The scores don’t mean much unless the models are optimized for certain precision.
Could you elaborate more or give link how each test is related to different workloads or tasks how those bits relate to it. I am not familiar with it.
 
1TB iPad Pro M4 didn't beat my my iPhone 15 Pro Max 256GB to death.

iPad M4
4701
7840
6814

iPhone 15 Pro Max
4046
6953
6065
Once again the OS version you're using has a MASSIVE effect.

The quality of the comments in this thread really tells you everything you need to know about the supposed "tech" internet:

99% clueless rants/complaints based on tribalism

1% informed comment that even tries to understand what the benchmarks are doing, what affects the results, and why specific results may not match what you expected.
 
  • Haha
Reactions: Frantisekj
Could you elaborate more or give link how each test is related to different workloads or tasks how those bits relate to it. I am not familiar with it.
What he means is that the Apple Foundation Models for Language (the Apple code that handles all AI language requests, so not just Siri "understanding" but also rewriting text, summarizing, translating, etc) has an average bit size per weight of about 3.7b. We know this because Apple published a paper on the subject. The same is probably true for the Vision model (again there's a united Vision model that handles various different types of vision requests, and I suspect also handles Pose). We have a series of papers on various aspects of the Vision model, but they're a few years old, and I don't believe there's been a recent paper giving the model size since Apple started really pushing model quantization.

His point is that Apple provides a platform (model conversion tools, model compiler, and hardware) that all work together to support models that operate well when compressed far beyond what other platforms currently achieve (see my comment above about the 8bit accuracy). So if you're obsessed with tribalism it's "somewhat unfair" to see Apple being punished for their performance at 32b (which only happens on GPU, not NPU) and even 16b, when Apple is all-in on optimizing for 8b and less.
Another way the benchmark is somewhat sub-optimal is that it does not give energy results, but of course that's the entire point of everyone moving to dedicated NPU hardware.

You could maybe write an alternative benchmark that calls into Apple APIs for some of the tasks above (eg image classification, pose detection, translation, etc). The results might not be *directly* comparable with the benchmark (which uses standard open models, so that more-or-less the same test is being run on all platforms) but the result might be VERY interesting in terms of comparing Apple's "systemwide" performance, not just the HW but also code optimized to the hardware.
This is probably an interesting project for any student out there wanting to get started at the very low-end of AI (just calling some APIs correctly) while also making a name for themself...
 
  • Like
Reactions: LockOn2B
iOS 18.1b provides ML performance boosts that substantially affect some of benchmarks.
You're testing with 18.0b

The same may possibly (I don't know that space) be true for your Samsung device, that some combination of Samsung drivers and Android software are substantially out of day relative to the current state of the art.
2024-08-15iPad16,5
ARM
Core ML Neural Engine33301886622117
2024-08-15iPad16,3
ARM
Core ML Neural Engine47333190840669

I havent see 18.1 results but here are 17.6.1. and 18.0b. Difference is big. Mainly in image manipulation, object detection and machine translation.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.