Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
M5 is only around 5 TFLOPs nominally, yet it performs close to the 15 TFLOps 4060 in Blender and the 9 TFLOPs 4050 mobile in 3Dmark wild life. Nominal peak performance is only a meaningful point of comparison if the rest of architecture is comparable. As it is, Apple is currently much better at extracting performance out of its GPU design because their GPU architecture is more work-efficient.
Hey @leman I think you made a mistake here. The M5 is indeed around a stock 4050 mobile in 3DMark's benchmarks (slightly better actually). However in Blender the M5 scores around 1733 but the 4060 (no way to control for boosted GPUs in the blender database sadly) laptop/desktop chips are nearly double at ~3200. Were you thinking of the estimations for the M5 Pro? Those should easily blow past the 4060 and may even rival the 5060 - including the full M5 Pro and the 5060 ti.
 
  • Like
Reactions: M4pro
Hey @leman I think you made a mistake here. The M5 is indeed around a stock 4050 mobile in 3DMark's benchmarks (slightly better actually). However in Blender the M5 scores around 1733 but the 4060 (no way to control for boosted GPUs in the blender database sadly) laptop/desktop chips are nearly double at ~3200. Were you thinking of the estimations for the M5 Pro? Those should easily blow past the 4060 and may even rival the 5060 - including the full M5 Pro and the 5060 ti.

Thank you for pointing this out! I think I must have been looking at CUDA scores, apologies!
 
  • Like
Reactions: crazy dave
There is no Extreme CPU from apple.

Lets do some conjecturing.

If the 40 gpu core M4 Max is close a RTX 3090, and assuming the M5 Max has the same 40 core count with a 30% improvement. We're looking at a 4080, maybe a 4090 class GPU.

The as yet unannounced M5 Ultra will conceivably have 2x the gpu cores, so we should see nearly double the performance, which does put us in the 5090 range.

This is just fast and loose napkin math type of conjecture, don't hold me to this given that we don't even have a M5 Pro, never mind, Max or Ultra.
M4 Max is no way near RTX 3090. RTX 3090 is 35.6 TFLOPS while M4 Max is 19 ~ 20 TFLOPS. To match RTX 5090's performance, you will need 190 M4 GPU cores. But then, Ultrafusion wont gonna help since doubling the core does not mean doubling the performance.

Also, dont forget that a lot of software are Nvidia based which boost the performance just like Apple Silicon works great with Blender due to unified memory.
 
M4 Max is no way near RTX 3090. RTX 3090 is 35.6 TFLOPS while M4 Max is 19 ~ 20 TFLOPS.

M4 Max is closer to 16 TFLOPS.

Of course, peak FP32 throughput is increasingly less useful for comparison. On paper the M4 Max should be two times slower, but it is within 15% of 3090 score in Wild Life and within 5% on Blender benchmark suite.

But then, Ultrafusion wont gonna help since doubling the core does not mean doubling the performance.

You could make the same argument for pretty much any GPU, Nvidia included. Apple’s performance scaling has been on par or better than the rest of the industry.


Also, dont forget that a lot of software are Nvidia based which boost the performance just like Apple Silicon works great with Blender due to unified memory.

I wouldn’t say that Blender performance advantage has anything to do with unified memory. The Blender benchmark suite uses fairly small scenes that should fit within the RAM budget of most consumer GPUs. Apples secret sauce for Blender are dynamic caching and superscalar execution, which allow them to achieve much better hardware utilization than anyone else.
 
You could make the same argument for pretty much any GPU, Nvidia included. Apple’s performance scaling has been on par or better than the rest of the industry.
At least Nvidia graphic cards has PCIe slots which allow them to add and expand unlike Ultrafusion which is a huge difference.

I wouldn’t say that Blender performance advantage has anything to do with unified memory. The Blender benchmark suite uses fairly small scenes that should fit within the RAM budget of most consumer GPUs. Apples secret sauce for Blender are dynamic caching and superscalar execution, which allow them to achieve much better hardware utilization than anyone else.
Still and so far, Blender is the only software that Apple Silicon chip is close to RTX 40/50 series from benchmarks tho not proven in realistic scenario.

Of course, peak FP32 throughput is increasingly less useful for comparison. On paper the M4 Max should be two times slower, but it is within 15% of 3090 score in Wild Life and within 5% on Blender benchmark suite.
Cant say M4 Max is as good as RTX 3090 in real life cause those tests are too short and games proves differently which requires RAW GPU performance.
 
At least Nvidia graphic cards has PCIe slots which allow them to add and expand unlike Ultrafusion which is a huge difference.

Not quite sure what you are talking about. I’ve never seen an expandable NVIDIA GPU. How would that even work?

Cant say M4 Max is as good as RTX 3090 in real life cause those tests are too short and games proves differently which requires RAW GPU performance.

Where do you get this from? And what does “raw performance” mean?
 
You literally insert GPU on the motherboard and expand.


Games are great example to compare such as Cyberpunk 2077.
Games are a horrible example because as some say when the performance isn't what is expected "they aren't optimized for Apple Silicon".
 
  • Like
Reactions: maflynn
You literally insert GPU on the motherboard and expand.
PCIe slots allow you to upgrade you're systems GPU performance by replacing the GPU card, i.e., going from a RTX 3070 to a RTX 5080. You're post was awkwardly written to imply you could upgrade an existing GPU's performance - not replace it

Games are a horrible example because as some say when the performance isn't what is expected "they aren't optimized for Apple Silicon".
To further add to your point, depending on the game, the amount of optimization will differ. There's been some games and at the extreme level there are some games that have AMD GPUs heavily optimized where as they hadn't put as much work on the nvidia side.
 
TSMC's 'new' packaging tech is generally just the same as the packaging tech used for the UltraFusion. The pads/connections are smaller, which means conceptually get more of them between dies. However, the primary performance impact going to get is that the two (or more) dies act like the performnce of just using a monolthiic die. It isn't going to bring performance in and on itself. The construction of something bigger than one max sized die is where the performance is coming from ( not the connection).

Folks keep waving at this new tech as magical sprinkle power. The horizontal packaging variant simply allows Apple to do 'better' Ultras they have been doing. If Apple switches to smaller building blocks, chiplets, then connectivity is just getting back to the performance that the monolithic "Max" sized chips could do. Might end up more economical ( if defect savings outweight increased packaging costs ) , but the more so 'costs' than 'performance'. The more expesnive fab process might be more easily paid for that way. But again the fab process is the core source of the performance enablement; not the connection.
Oh? Does TSMC package chips now? I didn’t know.
 
From the looks of it, the M5 Ultra (96 cores) will match the 5090 performance. At least in TFlops
 
I don't think we should expect a 25-30% GPU architecture improvement from M4 to M5 AND a 20% increase in GPU cores on practically the same process node (albeit third iteration - N3E to N3P). Much of the improvement is due to the higher memory bandwidth anyway.

That would balloon the die size and the M4 to M5 base chip retained it's transistor count at around 28B transistors.
 
I hope that M5 Max will be good enough to play WOW at ultra settings on 16 inch Macbook Pro 🙂
(Even though it's a CPU dependent game.)
 
Last edited:
From the looks of it, the M5 Ultra (96 cores) will match the 5090 performance. At least in TFlops

Sounds unlikely. The 5090 RTX offers just over 100 TFLOPs of FP32 compute. A single M5 GPU core running at 1.8 Ghz (which is a very generous estimate for M5 Max/Ultra) is just under 0.5 TFLOPs. You'd need to double either the number of cores or the frequency to get to 5090 levels.

Of course, TFLOPs is barely a helpful metric nowadays. It all depends on the workload. Apple has better compute efficiency, which plays a larger role, especially for complex shaders.

I don't think we should expect a 25-30% GPU architecture improvement from M4 to M5 AND a 20% increase in GPU cores on practically the same process node (albeit third iteration - N3E to N3P). Much of the improvement is due to the higher memory bandwidth anyway.

There are plenty of improvements besides the memory bandwidth. But the architectural differences make it very hard to provide ballpark estimates.
 
There are plenty of improvements besides the memory bandwidth. But the architectural differences make it very hard to provide ballpark estimates.
Yeah I agree, it also varies wildly by workload type but an increase from 120GB/s to 153GB/s (27.5% increase) in memory bandwidth certainly helps overall.
 
I wouldn’t say that Blender performance advantage has anything to do with unified memory. The Blender benchmark suite uses fairly small scenes that should fit within the RAM budget of most consumer GPUs. Apples secret sauce for Blender are dynamic caching and superscalar execution, which allow them to achieve much better hardware utilization than anyone else.
Why doesn't the dynamic caching and superscalar execution help the gaming side as much as it does the rendering side?
 
Because Apple isn’t working on the games you’re comparing to the same extent as Blender. Also the workload is different I imagine.
Is TBDR hindering Apple with respect to how say Assasins Creed Shadows performs? We know that Apple has the CPU advantage, so running at lower resolutions aren't as capped as in x86 land, but the GPU performance doesn't punch up in games like it does practically everywhere else.
 
Is TBDR hindering Apple with respect to how say Assasins Creed Shadows performs? We know that Apple has the CPU advantage, so running at lower resolutions aren't as capped as in x86 land, but the GPU performance doesn't punch up in games like it does practically everywhere else.
I don’t think TBDR hinders Apple, but the reality is that the priority for these game studios will be PC and console. They aren’t going to make the changes necessary to take advantage of Apple silicon. That being said, Nat Brown stated that CP2077 uses many features of Apple Silicon.

A couple of things to note is firstly, the degree to which driver optimisation plays a part in game performance. I personally wouldn’t want that on Apple’s platforms. Secondly, many of these gpus have a brute force approach to graphics. Both Nvidia and Amd have many more alus and more power to throw at the problem.
 
  • Like
Reactions: crazy dave
Sounds unlikely. The 5090 RTX offers just over 100 TFLOPs of FP32 compute. A single M5 GPU core running at 1.8 Ghz (which is a very generous estimate for M5 Max/Ultra) is just under 0.5 TFLOPs. You'd need to double either the number of cores or the frequency to get to 5090 levels.

Of course, TFLOPs is barely a helpful metric nowadays. It all depends on the workload. Apple has better compute efficiency, which plays a larger role, especially for complex shaders.



There are plenty of improvements besides the memory bandwidth. But the architectural differences make it very hard to provide ballpark estimates.

Geekbench Metal score will be 400,000 for 96 core M5 Ultra
Geekbench Metal score for 5090 is 380,000

Of course it won't have the raw power of the 5090 but it should be close in performance for irl performance
 
I don’t think TBDR hinders Apple, but the reality is that the priority for these game studios will be PC and console. They aren’t going to make the changes necessary to take advantage of Apple silicon. That being said, Nat Brown stated that CP2077 uses many features of Apple Silicon.

A couple of things to note is firstly, the degree to which driver optimisation plays a part in game performance. I personally wouldn’t want that on Apple’s platforms. Secondly, many of these gpus have a brute force approach to graphics. Both Nvidia and Amd have many more alus and more power to throw at the problem.
I assume driver optimizations are needed because API's and the code compilation is too generic. Apple doesn't have that crutch, therefore they don't have to do hacksfixes for things in drivers.

Sometimes I miss the days of dedicated graphics api per vendor.
 
I assume driver optimizations are needed because API's and the code compilation is too generic. Apple doesn't have that crutch, therefore they don't have to do hacksfixes for things in drivers.

Sometimes I miss the days of dedicated graphics api per vendor.
Pre-DX12, Vulkan, and Metal (so OpenGL and DX<=11), the drivers were extremely complicated and filled with loopholes and game specific optimizations. Sometimes in fact you can read people who do porting of older games in WINE talking about all the things they have to do to get specific games working because those games relied on a particular behavior that wasn't exactly in-spec shall we say.

The new APIs were meant to simplify the drivers down and more of the performance control was being passed to the developers for good and bad. My impression is that promise wasn't fully realized. Some custom game engines still deliberately break spec to get a little extra performance boost and day 0 driver patches are still required to get them to work at all never mind be performant. Further's Intel's first forays into dGPUs were thought to be hobbled by poor driver performance (though I'm less certain if that was specifically DX<11 games or if DX12 games likewise performed badly). So the drivers may not be quite as salient as they once were, but still tricky and yet necessary to get right.

Geekbench Metal score will be 400,000 for 96 core M5 Ultra
Geekbench Metal score for 5090 is 380,000

Of course it won't have the raw power of the 5090 but it should be close in performance for irl performance
Geekbench OpenCL for the 5090, no Metal for the Nvidia chip 😉 (and sadly no CUDA on the latest version of Geekebench). I know Primate Labs advertises GB GPU as being crossplatform, but you can see the discrepancies between API scores and I would be very careful comparing Geekbench Metal on the Apple GPU with Geekbench OpenCL on the Nvidia GPU. That said, a theoretical M5 Ultra should be very competitive in Blender.
There were some measurements done by some folks on X based on how M5 is performing and scaling
I think the question is why 96 cores (the current Ultra has 80) ... Apple tends to double core counts going from the Pro to the Ultra (M3 Pro very slight exception, but based on the M1, M2, and M4 Pro/Max/Ultra die design the GPU core counts have to be exactly doubled from tier to tier) - so you think the M5 Max is getting a bump to 48 cores from 40 and is the Pro getting a core count bump as well? What is your basis for that? I don't recall any rumor that the chips are getting a core count increase this generation though of course it's possible.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.