Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I’m fascinated by the use of NPU TOPs as if the GPUs on these Macs doesn’t blow the NPUs out of the water. Unless you think an iPhone 16 (35 TOPs) and an M4Max (38 TOPs) have broadly the same ML performance…

In theory, this is why tech bloggers have a job, where they test the real world performance. Which tasks are handled by the NPU, which by the GPU? When is the RAM bandwidth the limiting factor in either case? These are genuine questions and I would love it if someone can answer them.

Also: has Apple actually improved the NPU that much in one generation (M3 was 18 TOPs) or did they drop to INT8 or even INT4 calculations? Are the GPU and NPU capable of running the same precision?

Geekbench AI, though I am sure not perfect, provides a nice opportunity to measure the Mac's different subsystems for AI-type with different types:

SystemSubsystemFP32FP16INT8
Apple M4 MaxCore ML GPU203832251521447
Apple M4 MaxCore ML CPU551490867028
Apple M4 MaxCore ML Neural Engine55153819752421
Apple M3 UltraCore ML Neural Engine54893010033219
Apple M3 UltraCore ML GPU232702575023591
Apple M3 UltraCore ML CPU549784136609

What's interesting here is that the "best" choice of system depends on the type and subsystem you plan to use and similarly the data type one chooses may depend on the system they are on.

For example, for calculations that can use INT8 just as well as any, using the NPU of the M4 Max is clearly the fastest. An M3 Ultra would be a downgrade. However, for calculations that need FP32, the M3 Ultra is somewhat faster (though maybe not by enough to justify its higher cost, the large memory option of the M3 Ultra will probably come in handy with the larger memory requirements of FP32). At that point I am also switching from the NPU to the GPU.

On the other hand, I might prefer FP16 over INT8 on the M3 Ultra if my model converged faster (required fewer iterations) but depending that might not hold on the M4 Max (depending on how many more iterations are required).

Then for tasks that can use INT8 on the NPU, the latest iPhone may be ~ Studio M4 Max as others have pointed out, and that is kind of crazy. I suspect for FP32 calculations that can run on the M3 Ultra's 80xGPU, the iPhone won't keep up...

Last, though tools like Geekbench AI test their calculations across different data types and lets one preference the subsystem, I understand Apple's CoreML framework makes the final decision. From the data above, it appears likely that FP32 calculations ran on the CPU even when the NPU was preferenced.

My take away is that both selecting the optimal system and tuning software to run highly complex calculations on these highly complex systems is complicated...
 
Last edited:
  • Like
Reactions: leifp and bnumerick
Geekbench AI, though I am sure not perfect, provides a nice opportunity to measure the Mac's different subsystems for AI-type with different types:

SystemSubsystemFP32FP16INT8
Apple M4 MaxCore ML GPU203832251521447
Apple M4 MaxCore ML CPU551490867028
Apple M4 MaxCore ML Neural Engine55153819752421
Apple M3 UltraCore ML Neural Engine54893010033219
Apple M3 UltraCore ML GPU232702575023591
Apple M3 UltraCore ML CPU549784136609

What's interesting here is that the "best" choice of system depends on the type and subsystem you plan to use and similarly the data type one chooses may depend on the system they are on.

For example, for calculations that can use INT8 just as well as any, using the NPU of the M4 Max is clearly the fastest. An M3 Ultra would be a downgrade. However, for calculations that need FP32, the M3 Ultra is somewhat faster (though maybe not by enough to justify its higher cost, the large memory option of the M3 Ultra will probably come in handy with the larger memory requirements of FP32). At that point I am also switching from the NPU to the GPU.

On the other hand, I might prefer FP16 over INT8 on the M3 Ultra if my model converged faster (required fewer iterations) but depending that might not hold on the M4 Max (depending on how many more iterations are required).

Then for tasks that can use INT8 on the NPU, the latest iPhone may be ~ Studio M4 Max as others have pointed out, and that is kind of crazy. I suspect for FP32 calculations that can run on the M3 Ultra's 80xGPU, the iPhone won't keep up...

Last, though tools like Geekbench AI test their calculations across different data types and lets one preference the subsystem, I understand Apple's CoreML framework makes the final decision. From the data above, it appears likely that FP32 calculations ran on the CPU even when the NPU was preferenced.

My take away is that both selecting the optimal system and tuning software to run highly complex calculations on these highly complex systems is complicated...
Wow. Those aren’t remotely the scores I anticipated, not least because cutting precision in half generally doubles performance, assuming the format is supported.

As usual, it’s unlikely that Geekbench numbers play out in any meaningful way to actual use case performance. I get that we need “a number” for spitballing and that such a number is inherently not representative but these numbers confuse the heck out of me. With past Geekbench results, and having been a photographer, I can get the correlation between raw numbers and my use case. With this?…

A long road ahead of me to figure this stuff out! Thanks
 
Wow. Those aren’t remotely the scores I anticipated, not least because cutting precision in half generally doubles performance, assuming the format is supported.

Yes and that last part is key. The data type/format has to be hardware supported and additionally optimized for. It was also common in the past to see systems that ran FP64 ~ FP32. On the flip side, many Nvidia GPU run FP64 ~ 1/10th FP32. It really comes down to what the system was optimized for and what the bottlenecks were.

As usual, it’s unlikely that Geekbench numbers play out in any meaningful way to actual use case performance. I get that we need “a number” for spitballing and that such a number is inherently not representative but these numbers confuse the heck out of me.
With past Geekbench results, and having been a photographer, I can get the correlation between raw numbers and my use case. With this?…

A long road ahead of me to figure this stuff out! Thanks

I'd say Geekbench AI is the best we have right now as far as something widely available. Various people's Llama results are interesting for considering systems for LLM and the like but measurements are rarely widely standardized. Of course performance on "your code" (set of applications, etc) are most relevant and no benchmark matches everyone's needs.

You can read what went into the Geekbench AI benchmarks here:

Then note that historically (i.e. going back decades) a lot of systems have struggled to realize real world performance comparable to their projected/peak performance. If Geekbench AI is not showcasing the highest potential of these systems that is probably more indicative of likely real world performance than Geekbench AI falling short.

The biggest limitation of Geekbench AI is probably that it is was designed to run across a wide variety of hardware and so unlikely to be indicative of large models/datasets that only run on large desktops and not at all on smartphones, etc. It uses relatively small images, etc.

Last, I would not use Geekbench AI in lieu of Geekbench x (or any other benchmark that you like) unless your bottleneck is AI-type processing. It's mainly measuring algorithms that do things like image classification and NLP.
 
  • Like
Reactions: leifp
Many people are dumping their M1 Max and M1 Ultra Mac Studio models, just look on FB Marketplace or eBay to see so many for sale currently. The price has sure dropped also. You may want to keep yours, or sell it and upgrade to a M4 Max or M3 Ultra Mac Studio model, while you can still get a decent price for yours. I have a feeling the M1 Max and M1 Ultra Mac Studio models will soon become dinosaurs and not command that much of a price.
That may be so, but I'm not one of those guys who keep the latest and greatest just because. I'll keep using this one and get whatever is newest when it's time. I'm definitely getting my money's worth out of it!
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.