Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,076
18,717
If Apple wants to build a large GPU, there is nothing stopping them. The main problem is the low volume, large die size and therefore high cost of such a product. Then again, I dint expect the 4090 to be a high volume product either. I wouldn’t be surprised if it will mostly stay a marketing gimmick by Nvidia.
 

Xiao_Xi

macrumors 65816
Oct 27, 2021
1,442
871
Better software optimizations and more hardware accelerators. In the case of Blender, Apple can add hardware-based ray tracing and Metal RT support.
 

senttoschool

macrumors 68030
Nov 2, 2017
2,501
5,143
The problem is mostly software optimization. No productivity software has been optimized for Metal. There was never really a reason to.
 

Boil

macrumors 68040
Oct 23, 2018
3,144
2,751
Stargate Command
If Apple wants to build a large GPU, there is nothing stopping them. The main problem is the low volume, large die size and therefore high cost of such a product. Then again, I dint expect the 4090 to be a high volume product either. I wouldn’t be surprised if it will mostly stay a marketing gimmick by Nvidia.

Maybe the ASi Mac Pro has "mixed SoCs"; two "regular" M2 Max SoCs and two "GPU-specific" M2 SoCs...

M2 Max SoC - 12-core CPU (8P/4E) / 40-core GPU
M2 GPU SoC - 60-core GPU

M2 Extreme SoC - 24-core CPU (16P/8E) / 200-core GPU

Better software optimizations and more hardware accelerators. In the case of Blender, Apple can add hardware-based ray tracing and Metal RT support.

Highly interested to see what a Full Metal Blender can do running on the new ASi Mac Pro...!

The problem is mostly software optimization. No productivity software has been optimized for Metal. There was never really a reason to.

LOL, I just posted about this over here...

If software is developed to take advantage of the way ASi graphics work (Metal/GPU cores/Neural Engine cores/eventual ray-tracing cores/UMA/etc.), much like software is currently tailored to Nvidia hardware, then who knows what kind of performance comparisons we might see...?
 

altaic

macrumors 6502a
Jan 26, 2004
635
418
Better software optimizations and more hardware accelerators. In the case of Blender, Apple can add hardware-based ray tracing and Metal RT support.
MetalRT support has already been added to Blender. It’s an experimental option, and currently only for Apple Silicon. AMD support requires some more work, but it seems like AMD support in general is also a priority. There are a lot of tea leaves to read over at the Blender dev site.

BTW, Blender’s Metal Cycles renderer has seen a 33% speed up (according to the Blender benchmark, presumably with MetalRT disabled) from 3.2.0 to 3.3.0. The Apple/Blender engineers have made some great progress, and are continuing to do so. It’ll be exciting to see what’s ready by October 🙂
 
  • Like
Reactions: Xiao_Xi

senttoschool

macrumors 68030
Nov 2, 2017
2,501
5,143
BTW, Blender’s Metal Cycles renderer has seen a 33% speed up (according to the Blender benchmark, presumably with MetalRT disabled) from 3.2.0 to 3.3.0. The Apple/Blender engineers have made some great progress, and are continuing to do so. I’m excited to see what’s ready by October 🙂
It's a good start: https://www.macrumors.com/2021/10/14/apple-joins-blender-development-fund/

Apple needs to have an army of open source developers contributing to popular open source projects in order to optimize software for Apple Silicon.

Just simply making amazing hardware isn't enough. The world has been optimizing for x86 and Nvidia GPUs for decades.
 
  • Like
Reactions: Malus120

galad

macrumors 6502
Apr 22, 2022
415
325
Apple isn't the only one contributing arm64 optimizations to open-source softwares. Amazon is making a lot of contribution to speed up things on their Graviton cpu.
 

Xiao_Xi

macrumors 65816
Oct 27, 2021
1,442
871
Highly interested to see what a Full Metal Blender can do running on the new ASi Mac Pro...!
Some people on the Blender forums believe that Apple could achieve real-time ray tracing in the Blender viewport.
 
  • Like
Reactions: Boil

leman

macrumors Core
Oct 14, 2008
19,076
18,717
Maybe the ASi Mac Pro has "mixed SoCs"; two "regular" M2 Max SoCs and two "GPU-specific" M2 SoCs...

M2 Max SoC - 12-core CPU (8P/4E) / 40-core GPU
M2 GPU SoC - 60-core GPU

M2 Extreme SoC - 24-core CPU (16P/8E) / 200-core GPU

They probably will need to leverage some form of asymmetric multi-chip technology (instead of symmetric one they use today in the Ultra). Just stacking max dies together will be too expensive and won’t properly address user needs. But let’s see what they will come up with. Maybe one part of the solution will be significantly increasing the frequency at the expense of efficiency. They still have a lot of headroom. The Studio will probably easily dissipate 250-300W.
 

senttoschool

macrumors 68030
Nov 2, 2017
2,501
5,143
They probably will need to leverage some form of asymmetric multi-chip technology (instead of symmetric one they use today in the Ultra). Just stacking max dies together will be too expensive and won’t properly address user needs. But let’s see what they will come up with. Maybe one part of the solution will be significantly increasing the frequency at the expense of efficiency. They still have a lot of headroom. The Studio will probably easily dissipate 250-300W.
Creating multiple huge custom SoCs for the smallest Mac market make zero sense for Apple. Part of stacking Max dies together is to reduce R&D cost for Mac Pro-level SoCs.

Unless... of course... Apple decides to start Apple Silicon Cloud service: https://forums.macrumors.com/thread...t-a-40-core-soc-for-mac-pro-now-what.2306486/

Creating a cloud service would expand the market for big Apple Silicon SoCs beyond the Mac Pro.
 

Boil

macrumors 68040
Oct 23, 2018
3,144
2,751
Stargate Command
If Apple could pull off an 8-way UltraFusion...

Future Mn workstation SoC:
  • Eight SoCs (four CPU & GPU/four GPU) total
  • 64-core CPU (48P/16E)
  • 480-core GPU
  • 128-core Neural Engine
  • 2TB LPDDR5X SDRAM
  • 4TB/s UMA bandwidth
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,076
18,717
Creating multiple huge custom SoCs for the smallest Mac market make zero sense for Apple. Part of stacking Max dies together is to reduce R&D cost for Mac Pro-level SoCs.

Exactly. That’s why I don’t see any other way for them than making smaller chips, e.g. one containing 8x CPU clusters and one containing 16x GPU clusters, and developing some technology that allows them to link those together on a single package.

8 way cross-bar switch ... wow ... probably more complicated that the SoC.

Does it have to be a crossbar switch? I have no idea what modern systems use… like what does Nvlink or Apples Ultra Fusion use?
 

quarkysg

macrumors 65816
Oct 12, 2019
1,220
807
Does it have to be a crossbar switch? I have no idea what modern systems use… like what does Nvlink or Apples Ultra Fusion use?
I would think so. Linking the SoC dies serially would kill performance or cause corruption as updates on one SoC die's cache needs to be propagated as fast as possible to all other SoC dies.

I would think Apple's UltraFusion basically is a very fast bus to broadcast one die's memory access to the other to keep their caches in sync.
 

theorist9

macrumors 68040
May 28, 2015
3,550
2,658
If Apple wants to build a large GPU, there is nothing stopping them. The main problem is the low volume, large die size and therefore high cost of such a product. Then again, I dint expect the 4090 to be a high volume product either. I wouldn’t be surprised if it will mostly stay a marketing gimmick by Nvidia.
This made me curious about the GPU area of a hypothetical M2 Extreme (2 x M2 Ultra) vs. the 4090.

I've read the GPU takes up 30 mm^2 on an M2. The M1 Ultra has ~5 x as many GPU cores as the M1 (48 vs. 10), so if the same ratio applies to the M2, then an M2 Extreme's GPU should take up ~300 mm^2. By comparison, the 4090 is reportedly 608 mm^2 (with 76.3B transistors). That's huge–the entire M1 Max chip is 432 mm^2 (with 57B transistors).

Both the M2 and 4090 are built on a TSMC enhanced 5nm process.

So Apple could create a MacPro Extreme where the ratio of GPU:CPU cores is double that on their current architecture, which would give ~600 mm^2 for the GPU. I don't know if they'll invest the resources to do this.
 

leman

macrumors Core
Oct 14, 2008
19,076
18,717
I would think so. Linking the SoC dies serially would kill performance or cause corruption as updates on one SoC die's cache needs to be propagated as fast as possible to all other SoC dies.

But others are doing it somehow, right? There are plenty chips on the market that combine hundreds of CPU or GPU cores.

My naive imagination pictures some sort of network on a chip solution where chips can be connected together like legos to make a single entity, with additive cache capacity etc. No idea how feasible something like that is. But Intel is supposed to work on this kind of technology if I understand it correctly.

I've read the GPU takes up 30 mm^2 on an M2. The M1 Ultra has ~5 x as many GPU cores as the M1 (48 vs. 10), so if the same ratio applies to the M2, then an M2 Extreme's GPU should take up ~300 mm^2. By comparison, the 4090 is reportedly 608 mm^2 (with 76.3B transistors). That's huge–the entire M1 Max chip is 432 mm^2 (with 57B transistors).

Exactly. Imagine that Apple makes a separate GPU chiplet that comes with 32 cores and then puts a bunch of those together, wouldn’t that be a scalable solution?
 

theorist9

macrumors 68040
May 28, 2015
3,550
2,658
Exactly. Imagine that Apple makes a separate GPU chiplet that comes with 32 cores and then puts a bunch of those together, wouldn’t that be a scalable solution?
That would be scalable, but I thought the performance of their integrated CPU-GPU architecture relied on having both the CPU and GPU on the same die. Thus I imagined that, if they wanted to double up on the number of GPU cores in the Mac Pro, they would need to construct the "Extreme" chip from four "M2 Max Pro" subunits instead of four M2 Max's, where each M2 Max Pro was an expanded version of the M2 Max that contained double the number of GPU cores.

I.e., if the M2 Max has X CPU cores and Y GPU cores, then the "M2 Max Pro" would have X CPU Cores and 2Y GPU cores.

I estimate that would give them ~120 TFLOPs, as compared with ~80 TFLOPs for a 4090 and 90–100 TFLOPs for a 4090Ti, i.e., half-way between a single 4090 and dual 4090's for general GPU compute performace (we'll probably need to wait for M3, which will likely be on 3 nm, to get hardware RT).

But creating this new design just for the Mac Pro seems resource-intensive, so I don't know if they'd do that.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,076
18,717
That would be scalable, but I thought the performance of their integrated CPU-GPU architecture relied on having both the CPU and GPU on the same die.

Again, my perspective on this might be naive, but I don’s see a principal difference between same die and same package if the dies can be connected in this way. The topology should be the same in either case. I can imagine that a on-die solution might be more energy efficient and of course have slightly lower latency but those things will matter less on a desktop.



But creating this new design just for the Mac Pro seems resource-intensive, so I don't know if they'd do that.

One problem with Apples current design is that you can’t just take an individual component, there is ton of stuff that comes attached to it. People who could use a 128-core GPU likely don’t need a 32-core CPU etc. Apple will need to devise a more flexible scalable solution if they want to offer comprehensive options in that market in a way that’s commercially viable.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.