Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
M5 is only around 5 TFLOPs nominally, yet it performs close to the 15 TFLOps 4060 in Blender and the 9 TFLOPs 4050 mobile in 3Dmark wild life. Nominal peak performance is only a meaningful point of comparison if the rest of architecture is comparable. As it is, Apple is currently much better at extracting performance out of its GPU design because their GPU architecture is more work-efficient.
Hey @leman I think you made a mistake here. The M5 is indeed around a stock 4050 mobile in 3DMark's benchmarks (slightly better actually). However in Blender the M5 scores around 1733 but the 4060 (no way to control for boosted GPUs in the blender database sadly) laptop/desktop chips are nearly double at ~3200. Were you thinking of the estimations for the M5 Pro? Those should easily blow past the 4060 and may even rival the 5060 - including the full M5 Pro and the 5060 ti.
 
  • Like
Reactions: M4pro
Hey @leman I think you made a mistake here. The M5 is indeed around a stock 4050 mobile in 3DMark's benchmarks (slightly better actually). However in Blender the M5 scores around 1733 but the 4060 (no way to control for boosted GPUs in the blender database sadly) laptop/desktop chips are nearly double at ~3200. Were you thinking of the estimations for the M5 Pro? Those should easily blow past the 4060 and may even rival the 5060 - including the full M5 Pro and the 5060 ti.

Thank you for pointing this out! I think I must have been looking at CUDA scores, apologies!
 
  • Like
Reactions: crazy dave
There is no Extreme CPU from apple.

Lets do some conjecturing.

If the 40 gpu core M4 Max is close a RTX 3090, and assuming the M5 Max has the same 40 core count with a 30% improvement. We're looking at a 4080, maybe a 4090 class GPU.

The as yet unannounced M5 Ultra will conceivably have 2x the gpu cores, so we should see nearly double the performance, which does put us in the 5090 range.

This is just fast and loose napkin math type of conjecture, don't hold me to this given that we don't even have a M5 Pro, never mind, Max or Ultra.
M4 Max is no way near RTX 3090. RTX 3090 is 35.6 TFLOPS while M4 Max is 19 ~ 20 TFLOPS. To match RTX 5090's performance, you will need 190 M4 GPU cores. But then, Ultrafusion wont gonna help since doubling the core does not mean doubling the performance.

Also, dont forget that a lot of software are Nvidia based which boost the performance just like Apple Silicon works great with Blender due to unified memory.
 
M4 Max is no way near RTX 3090. RTX 3090 is 35.6 TFLOPS while M4 Max is 19 ~ 20 TFLOPS.

M4 Max is closer to 16 TFLOPS.

Of course, peak FP32 throughput is increasingly less useful for comparison. On paper the M4 Max should be two times slower, but it is within 15% of 3090 score in Wild Life and within 5% on Blender benchmark suite.

But then, Ultrafusion wont gonna help since doubling the core does not mean doubling the performance.

You could make the same argument for pretty much any GPU, Nvidia included. Apple’s performance scaling has been on par or better than the rest of the industry.


Also, dont forget that a lot of software are Nvidia based which boost the performance just like Apple Silicon works great with Blender due to unified memory.

I wouldn’t say that Blender performance advantage has anything to do with unified memory. The Blender benchmark suite uses fairly small scenes that should fit within the RAM budget of most consumer GPUs. Apples secret sauce for Blender are dynamic caching and superscalar execution, which allow them to achieve much better hardware utilization than anyone else.
 
You could make the same argument for pretty much any GPU, Nvidia included. Apple’s performance scaling has been on par or better than the rest of the industry.
At least Nvidia graphic cards has PCIe slots which allow them to add and expand unlike Ultrafusion which is a huge difference.

I wouldn’t say that Blender performance advantage has anything to do with unified memory. The Blender benchmark suite uses fairly small scenes that should fit within the RAM budget of most consumer GPUs. Apples secret sauce for Blender are dynamic caching and superscalar execution, which allow them to achieve much better hardware utilization than anyone else.
Still and so far, Blender is the only software that Apple Silicon chip is close to RTX 40/50 series from benchmarks tho not proven in realistic scenario.

Of course, peak FP32 throughput is increasingly less useful for comparison. On paper the M4 Max should be two times slower, but it is within 15% of 3090 score in Wild Life and within 5% on Blender benchmark suite.
Cant say M4 Max is as good as RTX 3090 in real life cause those tests are too short and games proves differently which requires RAW GPU performance.
 
At least Nvidia graphic cards has PCIe slots which allow them to add and expand unlike Ultrafusion which is a huge difference.

Not quite sure what you are talking about. I’ve never seen an expandable NVIDIA GPU. How would that even work?

Cant say M4 Max is as good as RTX 3090 in real life cause those tests are too short and games proves differently which requires RAW GPU performance.

Where do you get this from? And what does “raw performance” mean?
 
You literally insert GPU on the motherboard and expand.


Games are great example to compare such as Cyberpunk 2077.
Games are a horrible example because as some say when the performance isn't what is expected "they aren't optimized for Apple Silicon".
 
  • Like
Reactions: maflynn
You literally insert GPU on the motherboard and expand.
PCIe slots allow you to upgrade you're systems GPU performance by replacing the GPU card, i.e., going from a RTX 3070 to a RTX 5080. You're post was awkwardly written to imply you could upgrade an existing GPU's performance - not replace it

Games are a horrible example because as some say when the performance isn't what is expected "they aren't optimized for Apple Silicon".
To further add to your point, depending on the game, the amount of optimization will differ. There's been some games and at the extreme level there are some games that have AMD GPUs heavily optimized where as they hadn't put as much work on the nvidia side.
 
TSMC's 'new' packaging tech is generally just the same as the packaging tech used for the UltraFusion. The pads/connections are smaller, which means conceptually get more of them between dies. However, the primary performance impact going to get is that the two (or more) dies act like the performnce of just using a monolthiic die. It isn't going to bring performance in and on itself. The construction of something bigger than one max sized die is where the performance is coming from ( not the connection).

Folks keep waving at this new tech as magical sprinkle power. The horizontal packaging variant simply allows Apple to do 'better' Ultras they have been doing. If Apple switches to smaller building blocks, chiplets, then connectivity is just getting back to the performance that the monolithic "Max" sized chips could do. Might end up more economical ( if defect savings outweight increased packaging costs ) , but the more so 'costs' than 'performance'. The more expesnive fab process might be more easily paid for that way. But again the fab process is the core source of the performance enablement; not the connection.
Oh? Does TSMC package chips now? I didn’t know.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.