Mac Studio Buyer's Guide: All Models Compared

bzgnyc2 · Apr 2, 2025

leifp said:
I’m fascinated by the use of NPU TOPs as if the GPUs on these Macs doesn’t blow the NPUs out of the water. Unless you think an iPhone 16 (35 TOPs) and an M4Max (38 TOPs) have broadly the same ML performance…

In theory, this is why tech bloggers have a job, where they test the real world performance. Which tasks are handled by the NPU, which by the GPU? When is the RAM bandwidth the limiting factor in either case? These are genuine questions and I would love it if someone can answer them.

Also: has Apple actually improved the NPU that much in one generation (M3 was 18 TOPs) or did they drop to INT8 or even INT4 calculations? Are the GPU and NPU capable of running the same precision?

Geekbench AI, though I am sure not perfect, provides a nice opportunity to measure the Mac's different subsystems for AI-type with different types:

System	Subsystem	FP32	FP16	INT8
Apple M4 Max	Core ML GPU	20383	22515	21447
Apple M4 Max	Core ML CPU	5514	9086	7028
Apple M4 Max	Core ML Neural Engine	5515	38197	52421

Apple M3 Ultra	Core ML Neural Engine	5489	30100	33219
Apple M3 Ultra	Core ML GPU	23270	25750	23591
Apple M3 Ultra	Core ML CPU	5497	8413	6609

What's interesting here is that the "best" choice of system depends on the type and subsystem you plan to use and similarly the data type one chooses may depend on the system they are on.

For example, for calculations that can use INT8 just as well as any, using the NPU of the M4 Max is clearly the fastest. An M3 Ultra would be a downgrade. However, for calculations that need FP32, the M3 Ultra is somewhat faster (though maybe not by enough to justify its higher cost, the large memory option of the M3 Ultra will probably come in handy with the larger memory requirements of FP32). At that point I am also switching from the NPU to the GPU.

On the other hand, I might prefer FP16 over INT8 on the M3 Ultra if my model converged faster (required fewer iterations) but depending that might not hold on the M4 Max (depending on how many more iterations are required).

Then for tasks that can use INT8 on the NPU, the latest iPhone may be ~ Studio M4 Max as others have pointed out, and that is kind of crazy. I suspect for FP32 calculations that can run on the M3 Ultra's 80xGPU, the iPhone won't keep up...

Last, though tools like Geekbench AI test their calculations across different data types and lets one preference the subsystem, I understand Apple's CoreML framework makes the final decision. From the data above, it appears likely that FP32 calculations ran on the CPU even when the NPU was preferenced.

My take away is that both selecting the optimal system and tuning software to run highly complex calculations on these highly complex systems is complicated...

leifp · Apr 3, 2025

bzgnyc2 said:
Geekbench AI, though I am sure not perfect, provides a nice opportunity to measure the Mac's different subsystems for AI-type with different types:

System Subsystem FP32 FP16 INT8
Apple M4 Max Core ML GPU 20383 22515 21447
Apple M4 Max Core ML CPU 5514 9086 7028
Apple M4 Max Core ML Neural Engine 5515 38197 52421

Apple M3 Ultra Core ML Neural Engine 5489 30100 33219
Apple M3 Ultra Core ML GPU 23270 25750 23591
Apple M3 Ultra Core ML CPU 5497 8413 6609

What's interesting here is that the "best" choice of system depends on the type and subsystem you plan to use and similarly the data type one chooses may depend on the system they are on.

For example, for calculations that can use INT8 just as well as any, using the NPU of the M4 Max is clearly the fastest. An M3 Ultra would be a downgrade. However, for calculations that need FP32, the M3 Ultra is somewhat faster (though maybe not by enough to justify its higher cost, the large memory option of the M3 Ultra will probably come in handy with the larger memory requirements of FP32). At that point I am also switching from the NPU to the GPU.

On the other hand, I might prefer FP16 over INT8 on the M3 Ultra if my model converged faster (required fewer iterations) but depending that might not hold on the M4 Max (depending on how many more iterations are required).

Then for tasks that can use INT8 on the NPU, the latest iPhone may be ~ Studio M4 Max as others have pointed out, and that is kind of crazy. I suspect for FP32 calculations that can run on the M3 Ultra's 80xGPU, the iPhone won't keep up...

Last, though tools like Geekbench AI test their calculations across different data types and lets one preference the subsystem, I understand Apple's CoreML framework makes the final decision. From the data above, it appears likely that FP32 calculations ran on the CPU even when the NPU was preferenced.

My take away is that both selecting the optimal system and tuning software to run highly complex calculations on these highly complex systems is complicated...

Wow. Those aren’t remotely the scores I anticipated, not least because cutting precision in half generally doubles performance, assuming the format is supported.

As usual, it’s unlikely that Geekbench numbers play out in any meaningful way to actual use case performance. I get that we need “a number” for spitballing and that such a number is inherently not representative but these numbers confuse the heck out of me. With past Geekbench results, and having been a photographer, I can get the correlation between raw numbers and my use case. With this?…

A long road ahead of me to figure this stuff out! Thanks

bzgnyc2 · Apr 3, 2025

leifp said:
Wow. Those aren’t remotely the scores I anticipated, not least because cutting precision in half generally doubles performance, assuming the format is supported.

Yes and that last part is key. The data type/format has to be hardware supported and additionally optimized for. It was also common in the past to see systems that ran FP64 ~ FP32. On the flip side, many Nvidia GPU run FP64 ~ 1/10th FP32. It really comes down to what the system was optimized for and what the bottlenecks were.

leifp said:
As usual, it’s unlikely that Geekbench numbers play out in any meaningful way to actual use case performance. I get that we need “a number” for spitballing and that such a number is inherently not representative but these numbers confuse the heck out of me.

leifp said:
With past Geekbench results, and having been a photographer, I can get the correlation between raw numbers and my use case. With this?…

A long road ahead of me to figure this stuff out! Thanks

I'd say Geekbench AI is the best we have right now as far as something widely available. Various people's Llama results are interesting for considering systems for LLM and the like but measurements are rarely widely standardized. Of course performance on "your code" (set of applications, etc) are most relevant and no benchmark matches everyone's needs.

You can read what went into the Geekbench AI benchmarks here:

https://www.geekbench.com/doc/geekbench-ai-workloads.pdf

Then note that historically (i.e. going back decades) a lot of systems have struggled to realize real world performance comparable to their projected/peak performance. If Geekbench AI is not showcasing the highest potential of these systems that is probably more indicative of likely real world performance than Geekbench AI falling short.

The biggest limitation of Geekbench AI is probably that it is was designed to run across a wide variety of hardware and so unlikely to be indicative of large models/datasets that only run on large desktops and not at all on smartphones, etc. It uses relatively small images, etc.

Last, I would not use Geekbench AI in lieu of Geekbench x (or any other benchmark that you like) unless your bottleneck is AI-type processing. It's mainly measuring algorithms that do things like image classification and NLP.

BelgianChoklit · Apr 3, 2025

derbyshire2013 said:
Have the base M2 Max myself and it still feels aOK for the things I use it for.

Not into things like music production, video editing etc but more than enough for my needs.

Not surprising, I mean the M2 is still quite recent

MattA · Apr 5, 2025

adamw said:
Many people are dumping their M1 Max and M1 Ultra Mac Studio models, just look on FB Marketplace or eBay to see so many for sale currently. The price has sure dropped also. You may want to keep yours, or sell it and upgrade to a M4 Max or M3 Ultra Mac Studio model, while you can still get a decent price for yours. I have a feeling the M1 Max and M1 Ultra Mac Studio models will soon become dinosaurs and not command that much of a price.

That may be so, but I'm not one of those guys who keep the latest and greatest just because. I'll keep using this one and get whatever is newest when it's time. I'm definitely getting my money's worth out of it!

Search

Search

Mac Studio Buyer's Guide: All Models Compared

bzgnyc2

macrumors 6502a

leifp

macrumors 6502a

bzgnyc2

macrumors 6502a

BelgianChoklit

Suspended

MattA

macrumors 6502

Our Staff