Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
No they don’t. Why do you people keep up this nonsense?
Sorry, I'm living here in the real world where macs can't compete in high end gaming. I'm open to actual evidence (not marketing copy) that their GPUs are competitive on AI but I haven't seen anything to suggest that
 
  • Like
Reactions: avro707
How many Mac Studios would you need to connect that?
Up to a max of 4. Effectively 2TB of RAM, a 320 core GPU, 128 core CPU, 128 core NPU.
It's the shared RAM from using RDMA to create up to 2TB of vRAM that makes it compelling for enterprise local LLMs.
US$40k for that setup.
So not a consumer level feature or rig.
If you watch the youtube videos I linked to in my OP they'll point out how compelling that setup is to alternatives from NVidia etc in both price and power consumption.

The reason Apple doesn't update the "Ultra" series of M processors as regularly as the standard, Pro and Max will be due to lack of volume.
If a feature like locally hosted LLMs helps enterprise sales of Mac Studio Ultras, increasing that volume, it might enable Apple to justify more regular iterations of the Ultra series.

As I said in my OP it's all about memory and memory bandwidth at this point in time and that might be where the focus is on the M series roadmap.
The next iteration of M Pro/Max/Ultra might allow for more than 4 units in a RDMA cluster, or more likely, much higher bandwidth over RDMA between each unit which is the real limitation.
 
  • Like
Reactions: JulianL

The next iteration of M Pro/Max/Ultra might allow for more than 4 units in a RDMA cluster, or more likely, much higher bandwidth over RDMA between each unit which is the real limitation.

I agree that bandwidth is the limitation. Running RDMA over 80Gbps Thunderbolt 5 is, in the context of the overall HPC market, quite underwhelming in terms of bandwidth (not sure how the latency of TB5 compares with a “proper” HPC interconnect). The issue I see in going beyond TB5 is cost. I’d imagine that most Mac Studio purchasers have a single machine so would not be happy to pay an extra $1000+ for a 200Gbps interconnect (for example) that they’re never going to use.

Maybe a solution could be for the next iteration of the Studio to have a slot where a user could buy a higher speed interconnect as a separately priced plug-in module but Apple might consider the potential sales volumes too low for that to be worth investing in the R&D to make it a reality.
 
  • Like
Reactions: Seoras
Sorry, I'm living here in the real world where macs can't compete in high end gaming.
But is there any indication that Apple is seeing that as a problem? With the introduction of ARM Macs they discontinued support for dGPUs. They knew what they were choosing, the limits and trade-offs.
 
Maybe a solution could be for the next iteration of the Studio to have a slot where a user could buy a higher speed interconnect as a separately priced plug-in module but Apple might consider the potential sales volumes too low for that to be worth investing in the R&D to make it a reality.
You mean like a Mac Pro if you put a ConnectX in a PCIe slot and it had working drivers?
 
But is there any indication that Apple is seeing that as a problem? With the introduction of ARM Macs they discontinued support for dGPUs. They knew what they were choosing, the limits and trade-offs.

Isn’t this a false dichotomy? I’d say that Apple Silicon are much better at gaming than Intel Macs ever were. The GPUs are certainly a step up from what we had before. In fact, it is the first time in a long while that we can have newer action games running at acceptable frame rates on lower-end Apple computers.

From my perspective, the gaming issue is the ecosystem rather than hardware. Gaming experience is subpar on Mac because it’s treated as a niche market by developers. Again, it’s not any different than what we had in the Intel Mac era.

The main difference was that you could run Windows natively on Intel Macs, giving you better performance on some models. But that’s hardly “Mac gaming”, is it?
 
I agree that bandwidth is the limitation. Running RDMA over 80Gbps Thunderbolt 5 is, in the context of the overall HPC market, quite underwhelming in terms of bandwidth (not sure how the latency of TB5 compares with a “proper” HPC interconnect). The issue I see in going beyond TB5 is cost. I’d imagine that most Mac Studio purchasers have a single machine so would not be happy to pay an extra $1000+ for a 200Gbps interconnect (for example) that they’re never going to use.

Maybe a solution could be for the next iteration of the Studio to have a slot where a user could buy a higher speed interconnect as a separately priced plug-in module but Apple might consider the potential sales volumes too low for that to be worth investing in the R&D to make it a reality.

Another issue is the SoC itself. Apple Silicon has been severely I/O limited so far due to design constraints. Even if they add an optional high-bandwidth interconnect, there are just not enough PCI lanes to power it (unless you divert them from the TB5 ports). There are some rumors that newer chips might use die stacking, which might change how they design things, so who knows.
 
Isn’t this a false dichotomy? I’d say that Apple Silicon are much better at gaming than Intel Macs ever were.
That is true, but a high-end gaming PC is still a lot faster. How relevant that is, or to how many people that is relevant, is another question. And sure, we all know that the problem with Macs and gaming is not (primarily) the hardware.
 
That is true, but a high-end gaming PC is still a lot faster.

Sure. What I’m trying to point out is that this was also the case in the Intel Mac era. Maybe you had more ways to get around it (like putting a fast GPU in your MP and running Windows), but again that’s a different thing. The bottom line is that Apple never offered or supported enthusiast level gaming GPUs. Unless you count the very early days, where the fastest GPU in the world consumed less that 50 watts and could fit in a laptop :)
 
  • Like
Reactions: Basic75
If you work the math backwards from what we “have” in the M4 GPU…

M4 Max Current Specs:
• 40 GPU cores
• Total package power: ~90-95W under full load (CPU + GPU)
• GPU portion: roughly 50-60W at full tilt (educated guess based on teardowns/reviews)
The 2nm Efficiency Gain
TSMC claims 2nm delivers same performance at 25-30% less power (or 15% more performance at same power).
So if M4 Max GPU uses ~55W for 40 cores:
• M6 GPU at 2nm: 40 cores would use ~38-40W for same performance
• That frees up 15-17W of thermal/power headroom
How Many More Cores Can We Fit?
If we assume linear scaling (which isn’t perfect but close enough):
• 40 cores = 38W (2nm efficient)
• Each core ≈ 0.95W
• With 15-17W freed up: +16-18 more cores
• Total: 56-58 GPU cores at the same power envelope
But Wait - Chiplet Advantages
With a dedicated GPU chiplet and better WMCM thermal management:
• Better heat dissipation (not sharing die space)
• Could potentially push 5-10W higher on GPU without thermal issues
• That’s another 5-10 cores
Realistic ceiling: 60-65 GPU cores in M6 Max MBP

So, its still super early, but given we're all speculating...

my take:
They'll use the new process to have slightly bigger (more cache, etc.) CPU/GPU cores and raise clock by 5%. I don't think the GPU core count will increase substantially. GPU cores on the max to go from 40 to 48 tops.

Going to a new process i would not expect drastic design changes. M5 was already pretty significant. That will probably be M7.
 
  • Like
Reactions: lapstags
However, it remains to be seen how Apple will approach the ANE and GPU in the LLM era. GPUs are better at running LLMs but Apple needs to make local LLMs work well in an iPhone first and foremost. Which one will they spend more silicon on?

On desktops, it makes sense that it is the GPU since power is less of a concern. On mobile, I'm not convinced that GPUs are the way to go. We could see a divergence where the desktops get bigger and bigger GPUs and the mobile chips get bigger and bigger ANE.

They're doing both. The ANE has continually improved, and they just added NPUs to the GPU cores in M5/A19 to get 4x better compute out of them on those tasks.
 
~
Another issue is the SoC itself. Apple Silicon has been severely I/O limited so far due to design constraints. Even if they add an optional high-bandwidth interconnect, there are just not enough PCI lanes to power it (unless you divert them from the TB5 ports). There are some rumors that newer chips might use die stacking, which might change how they design things, so who knows.
So an optional high-bandwidth interconnect would be the province of the Ultra, then? All three generations of the Max Studio have had four Thunderbolt ports while the Ultra Studio has always had six. So that has left two unused.

Let’s imagine the custom M2 Ultra PCC hardware we know Apple is shipping to itself uses all of them. Let’s also imagine Apple is using a networking solution provided by Broadcom for that. So it isn’t hard to imagine two options for the Ultra Studio: one (as now) with six TB5 ports and 10Gb Ethernet, the other with four TB5 ports and (expensive) high-bandwidth networking.
 
Last edited:
Let’s imagine the custom M2 Ultra PCC hardware we know Apple is shipping to itself uses all of them. Let’s also imagine Apple is using a networking solution provided by Broadcom for that. So it isn’t hard to imagine two options for the Ultra Studio: one (as now) with six TB5 ports and 10Gb Ethernet, the other with four TB5 ports and (expensive) high-bandwidth networking.
You need a lot more PCIe lanes for fast networking.
 
So an optional high-bandwidth interconnect would be the province of the Ultra, then? All three generations of the Max Studio have had four Thunderbolt ports while the Ultra Studio has always had six. So that has left two unused.

Let’s imagine the custom M2 Ultra PCC hardware we know Apple is shipping to itself uses all of them. Let’s also imagine Apple is using a networking solution provided by Broadcom for that. So it isn’t hard to imagine two options for the Ultra Studio: one (as now) with six TB5 ports and 10Gb Ethernet, the other with four TB5 ports and (expensive) high-bandwidth networking.

That would only give you 8x PCIe lanes or ~124 Gbps. You'll need more than that to claim high-bandwidth interconnect capability. The problem is that the Apple Silicon has limited die area for implementing I/O, since it needs so much space for memory controllers. We were just discussing this very topic in adjacent threads. What's interesting is that Apple has a patent describing using a dedicated I/O die as a bridge between multiple SoCs. The bridge would be connected using UltraFusion and could provide signal routing and additional I/O capabilities.
 
That would only give you 8x PCIe lanes or ~124 Gbps. You'll need more than that to claim high-bandwidth interconnect capability. The problem is that the Apple Silicon has limited die area for implementing I/O, since it needs so much space for memory controllers. We were just discussing this very topic in adjacent threads. What's interesting is that Apple has a patent describing using a dedicated I/O die as a bridge between multiple SoCs. The bridge would be connected using UltraFusion and could provide signal routing and additional I/O capabilities.
It’s four PCIe 4.0 lanes per Thunderbolt 5 port, right? So if (as I suggested) Apple drops the two extra TB5 ports on the front of the Ultra Studio and redirects those lanes to networking, that gives them a total of 16x to use for it, not 8x.

This seems like about as far as it can be taken without SoIC — that patent you mentioned sounds like it could still apply to two SoIC connected via UltraFusion (something TSMC explicitly says is possible), so perhaps this is moot.
 
This seems like about as far as it can be taken without SoIC — that patent you mentioned sounds like it could still apply to two SoIC connected via UltraFusion (something TSMC explicitly says is possible), so perhaps this is moot.

This is what I’ve been thinking as well. The patent clearly intends the I/O die to be connected via UltraFusion, and they depict it used as a bridge to build a quad-SoC system. But what prevents them from potentially using UltraFusion from adding more I/O? That would be applicable to any Max die.
 
  • Like
Reactions: throAU and Seoras
But what prevents them from potentially using UltraFusion from adding more I/O? That would be applicable to any Max die.

This is really interesting - on the max the ultrafusion bus is sitting there doing literally nothing....


Hook it up to some IO?

Bigger GPU option?
high speed networking options?

For most, the CPU on the max has enough cores, but GPU is still kinda weak (relative to the total system performance i mean, its still amazing in terms of power:watt!)


I did speculate years ago (just after M1) that Apple should build M series daughterboards to just slot into a backplane.

Looks like they may finally be doing something similar, just at the socket level, on the same package. I still think they should offer a box thats just an infiniband backplane that you slot ultras on a card into.
 
Last edited:
This is really interesting - on the max the ultrafusion bus is sitting there doing literally nothing....


Hook it up to some IO?

Bigger GPU option?
high speed networking options?

For most, the CPU on the max has enough cores, but GPU is still kinda weak (relative to the total system performance i mean, its still amazing in terms of power:watt!)

I did speculate years ago (just after M1) that Apple should build M series daughterboards to just slot into a backplane.

I suppose the big question why didn't they do it yet? is it the limitation of packaging technology, lack of interest, costs?

Looks like they may finally be doing something similar, just at the socket level, on the same package. I still think they should offer a box thats just an infiniband backplane that you slot ultras on a card into.

You might find this interesting: https://patentscope.wipo.int/search/en/detail.jsf?docId=US469223044&_cid=P21-MJV6N5-78168-1

Of course, that is very unlikely to be ever release as an end user system.
 
  • Like
Reactions: throAU
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.