M6 How Big of a Jump Are We Looking At?

maxoakland · Tuesday at 12:18 PM

thenewperson said:
No they don’t. Why do you people keep up this nonsense?

Sorry, I'm living here in the real world where macs can't compete in high end gaming. I'm open to actual evidence (not marketing copy) that their GPUs are competitive on AI but I haven't seen anything to suggest that

Seoras · Tuesday at 1:23 PM

maxoakland said:
How many Mac Studios would you need to connect that?

Up to a max of 4. Effectively 2TB of RAM, a 320 core GPU, 128 core CPU, 128 core NPU.
It's the shared RAM from using RDMA to create up to 2TB of vRAM that makes it compelling for enterprise local LLMs.
US$40k for that setup.
So not a consumer level feature or rig.
If you watch the youtube videos I linked to in my OP they'll point out how compelling that setup is to alternatives from NVidia etc in both price and power consumption.

The reason Apple doesn't update the "Ultra" series of M processors as regularly as the standard, Pro and Max will be due to lack of volume.
If a feature like locally hosted LLMs helps enterprise sales of Mac Studio Ultras, increasing that volume, it might enable Apple to justify more regular iterations of the Ultra series.

As I said in my OP it's all about memory and memory bandwidth at this point in time and that might be where the focus is on the M series roadmap.
The next iteration of M Pro/Max/Ultra might allow for more than 4 units in a RDMA cluster, or more likely, much higher bandwidth over RDMA between each unit which is the real limitation.

JulianL · Wednesday at 1:07 AM

Seoras said:
…
The next iteration of M Pro/Max/Ultra might allow for more than 4 units in a RDMA cluster, or more likely, much higher bandwidth over RDMA between each unit which is the real limitation.

I agree that bandwidth is the limitation. Running RDMA over 80Gbps Thunderbolt 5 is, in the context of the overall HPC market, quite underwhelming in terms of bandwidth (not sure how the latency of TB5 compares with a “proper” HPC interconnect). The issue I see in going beyond TB5 is cost. I’d imagine that most Mac Studio purchasers have a single machine so would not be happy to pay an extra $1000+ for a 200Gbps interconnect (for example) that they’re never going to use.

Maybe a solution could be for the next iteration of the Studio to have a slot where a user could buy a higher speed interconnect as a separately priced plug-in module but Apple might consider the potential sales volumes too low for that to be worth investing in the R&D to make it a reality.

Basic75 · Wednesday at 1:48 AM

maxoakland said:
Sorry, I'm living here in the real world where macs can't compete in high end gaming.

But is there any indication that Apple is seeing that as a problem? With the introduction of ARM Macs they discontinued support for dGPUs. They knew what they were choosing, the limits and trade-offs.

Basic75 · Wednesday at 1:50 AM

JulianL said:
Maybe a solution could be for the next iteration of the Studio to have a slot where a user could buy a higher speed interconnect as a separately priced plug-in module but Apple might consider the potential sales volumes too low for that to be worth investing in the R&D to make it a reality.

You mean like a Mac Pro if you put a ConnectX in a PCIe slot and it had working drivers?

Basic75 · Wednesday at 1:52 AM

maxoakland said:
that their GPUs are competitive on AI but I haven't seen anything to suggest that

Apple mostly caters to consumers that do AI inference, not training, which uses NPUs, not GPUs.

leman · Wednesday at 2:12 AM

Basic75 said:
But is there any indication that Apple is seeing that as a problem? With the introduction of ARM Macs they discontinued support for dGPUs. They knew what they were choosing, the limits and trade-offs.

Isn’t this a false dichotomy? I’d say that Apple Silicon are much better at gaming than Intel Macs ever were. The GPUs are certainly a step up from what we had before. In fact, it is the first time in a long while that we can have newer action games running at acceptable frame rates on lower-end Apple computers.

From my perspective, the gaming issue is the ecosystem rather than hardware. Gaming experience is subpar on Mac because it’s treated as a niche market by developers. Again, it’s not any different than what we had in the Intel Mac era.

The main difference was that you could run Windows natively on Intel Macs, giving you better performance on some models. But that’s hardly “Mac gaming”, is it?

leman · Wednesday at 3:05 AM

JulianL said:
I agree that bandwidth is the limitation. Running RDMA over 80Gbps Thunderbolt 5 is, in the context of the overall HPC market, quite underwhelming in terms of bandwidth (not sure how the latency of TB5 compares with a “proper” HPC interconnect). The issue I see in going beyond TB5 is cost. I’d imagine that most Mac Studio purchasers have a single machine so would not be happy to pay an extra $1000+ for a 200Gbps interconnect (for example) that they’re never going to use.

Maybe a solution could be for the next iteration of the Studio to have a slot where a user could buy a higher speed interconnect as a separately priced plug-in module but Apple might consider the potential sales volumes too low for that to be worth investing in the R&D to make it a reality.

Another issue is the SoC itself. Apple Silicon has been severely I/O limited so far due to design constraints. Even if they add an optional high-bandwidth interconnect, there are just not enough PCI lanes to power it (unless you divert them from the TB5 ports). There are some rumors that newer chips might use die stacking, which might change how they design things, so who knows.

Basic75 · Wednesday at 3:36 AM

leman said:
Isn’t this a false dichotomy? I’d say that Apple Silicon are much better at gaming than Intel Macs ever were.

That is true, but a high-end gaming PC is still a lot faster. How relevant that is, or to how many people that is relevant, is another question. And sure, we all know that the problem with Macs and gaming is not (primarily) the hardware.

leman · Wednesday at 4:06 AM

Basic75 said:
That is true, but a high-end gaming PC is still a lot faster.

Sure. What I’m trying to point out is that this was also the case in the Intel Mac era. Maybe you had more ways to get around it (like putting a fast GPU in your MP and running Windows), but again that’s a different thing. The bottom line is that Apple never offered or supported enthusiast level gaming GPUs. Unless you count the very early days, where the fastest GPU in the world consumed less that 50 watts and could fit in a laptop

throAU · Wednesday at 4:15 AM

BenRacicot said:
If you work the math backwards from what we “have” in the M4 GPU…

M4 Max Current Specs:
• 40 GPU cores
• Total package power: ~90-95W under full load (CPU + GPU)
• GPU portion: roughly 50-60W at full tilt (educated guess based on teardowns/reviews)
The 2nm Efficiency Gain
TSMC claims 2nm delivers same performance at 25-30% less power (or 15% more performance at same power).
So if M4 Max GPU uses ~55W for 40 cores:
• M6 GPU at 2nm: 40 cores would use ~38-40W for same performance
• That frees up 15-17W of thermal/power headroom
How Many More Cores Can We Fit?
If we assume linear scaling (which isn’t perfect but close enough):
• 40 cores = 38W (2nm efficient)
• Each core ≈ 0.95W
• With 15-17W freed up: +16-18 more cores
• Total: 56-58 GPU cores at the same power envelope
But Wait - Chiplet Advantages
With a dedicated GPU chiplet and better WMCM thermal management:
• Better heat dissipation (not sharing die space)
• Could potentially push 5-10W higher on GPU without thermal issues
• That’s another 5-10 cores
Realistic ceiling: 60-65 GPU cores in M6 Max MBP

So, its still super early, but given we're all speculating...

my take:
They'll use the new process to have slightly bigger (more cache, etc.) CPU/GPU cores and raise clock by 5%. I don't think the GPU core count will increase substantially. GPU cores on the max to go from 40 to 48 tops.

Going to a new process i would not expect drastic design changes. M5 was already pretty significant. That will probably be M7.

throAU · Wednesday at 4:29 AM

senttoschool said:
However, it remains to be seen how Apple will approach the ANE and GPU in the LLM era. GPUs are better at running LLMs but Apple needs to make local LLMs work well in an iPhone first and foremost. Which one will they spend more silicon on?

On desktops, it makes sense that it is the GPU since power is less of a concern. On mobile, I'm not convinced that GPUs are the way to go. We could see a divergence where the desktops get bigger and bigger GPUs and the mobile chips get bigger and bigger ANE.

They're doing both. The ANE has continually improved, and they just added NPUs to the GPU cores in M5/A19 to get 4x better compute out of them on those tasks.

tenthousandthings · Wednesday at 6:12 AM

~

leman said:
Another issue is the SoC itself. Apple Silicon has been severely I/O limited so far due to design constraints. Even if they add an optional high-bandwidth interconnect, there are just not enough PCI lanes to power it (unless you divert them from the TB5 ports). There are some rumors that newer chips might use die stacking, which might change how they design things, so who knows.

So an optional high-bandwidth interconnect would be the province of the Ultra, then? All three generations of the Max Studio have had four Thunderbolt ports while the Ultra Studio has always had six. So that has left two unused.

Let’s imagine the custom M2 Ultra PCC hardware we know Apple is shipping to itself uses all of them. Let’s also imagine Apple is using a networking solution provided by Broadcom for that. So it isn’t hard to imagine two options for the Ultra Studio: one (as now) with six TB5 ports and 10Gb Ethernet, the other with four TB5 ports and (expensive) high-bandwidth networking.

Basic75 · Wednesday at 7:52 AM

tenthousandthings said:
Let’s imagine the custom M2 Ultra PCC hardware we know Apple is shipping to itself uses all of them. Let’s also imagine Apple is using a networking solution provided by Broadcom for that. So it isn’t hard to imagine two options for the Ultra Studio: one (as now) with six TB5 ports and 10Gb Ethernet, the other with four TB5 ports and (expensive) high-bandwidth networking.

You need a lot more PCIe lanes for fast networking.

leman · Wednesday at 8:04 AM

tenthousandthings said:
So an optional high-bandwidth interconnect would be the province of the Ultra, then? All three generations of the Max Studio have had four Thunderbolt ports while the Ultra Studio has always had six. So that has left two unused.

Let’s imagine the custom M2 Ultra PCC hardware we know Apple is shipping to itself uses all of them. Let’s also imagine Apple is using a networking solution provided by Broadcom for that. So it isn’t hard to imagine two options for the Ultra Studio: one (as now) with six TB5 ports and 10Gb Ethernet, the other with four TB5 ports and (expensive) high-bandwidth networking.

That would only give you 8x PCIe lanes or ~124 Gbps. You'll need more than that to claim high-bandwidth interconnect capability. The problem is that the Apple Silicon has limited die area for implementing I/O, since it needs so much space for memory controllers. We were just discussing this very topic in adjacent threads. What's interesting is that Apple has a patent describing using a dedicated I/O die as a bridge between multiple SoCs. The bridge would be connected using UltraFusion and could provide signal routing and additional I/O capabilities.

tenthousandthings · Wednesday at 9:34 AM

leman said:
That would only give you 8x PCIe lanes or ~124 Gbps. You'll need more than that to claim high-bandwidth interconnect capability. The problem is that the Apple Silicon has limited die area for implementing I/O, since it needs so much space for memory controllers. We were just discussing this very topic in adjacent threads. What's interesting is that Apple has a patent describing using a dedicated I/O die as a bridge between multiple SoCs. The bridge would be connected using UltraFusion and could provide signal routing and additional I/O capabilities.

It’s four PCIe 4.0 lanes per Thunderbolt 5 port, right? So if (as I suggested) Apple drops the two extra TB5 ports on the front of the Ultra Studio and redirects those lanes to networking, that gives them a total of 16x to use for it, not 8x.

This seems like about as far as it can be taken without SoIC — that patent you mentioned sounds like it could still apply to two SoIC connected via UltraFusion (something TSMC explicitly says is possible), so perhaps this is moot.

leman · Wednesday at 10:11 AM

tenthousandthings said:
This seems like about as far as it can be taken without SoIC — that patent you mentioned sounds like it could still apply to two SoIC connected via UltraFusion (something TSMC explicitly says is possible), so perhaps this is moot.

This is what I’ve been thinking as well. The patent clearly intends the I/O die to be connected via UltraFusion, and they depict it used as a bridge to build a quad-SoC system. But what prevents them from potentially using UltraFusion from adding more I/O? That would be applicable to any Max die.

dgdosen · Wednesday at 10:34 AM

Might this all spur the creation of desktop TB/s optical interconnect?

iPhone2019 · Wednesday at 12:22 PM

Will the base M6 model have WMCM packaging? Or is it too expensive?

throAU · Wednesday at 9:34 PM

leman said:
But what prevents them from potentially using UltraFusion from adding more I/O? That would be applicable to any Max die.

This is really interesting - on the max the ultrafusion bus is sitting there doing literally nothing....

Hook it up to some IO?

Bigger GPU option?
high speed networking options?

For most, the CPU on the max has enough cores, but GPU is still kinda weak (relative to the total system performance i mean, its still amazing in terms of power:watt!)

I did speculate years ago (just after M1) that Apple should build M series daughterboards to just slot into a backplane.

Looks like they may finally be doing something similar, just at the socket level, on the same package. I still think they should offer a box thats just an infiniband backplane that you slot ultras on a card into.

leman · Thursday at 12:29 AM

throAU said:
This is really interesting - on the max the ultrafusion bus is sitting there doing literally nothing....

Hook it up to some IO?

Bigger GPU option?
high speed networking options?

For most, the CPU on the max has enough cores, but GPU is still kinda weak (relative to the total system performance i mean, its still amazing in terms of power:watt!)

I did speculate years ago (just after M1) that Apple should build M series daughterboards to just slot into a backplane.

I suppose the big question why didn't they do it yet? is it the limitation of packaging technology, lack of interest, costs?

throAU said:
Looks like they may finally be doing something similar, just at the socket level, on the same package. I still think they should offer a box thats just an infiniband backplane that you slot ultras on a card into.

You might find this interesting: https://patentscope.wipo.int/search/en/detail.jsf?docId=US469223044&_cid=P21-MJV6N5-78168-1

Of course, that is very unlikely to be ever release as an end user system.

throAU · Thursday at 1:04 AM

leman said:
You might find this interesting: https://patentscope.wipo.int/search/en/detail.jsf?docId=US469223044&_cid=P21-MJV6N5-78168-1

Of course, that is very unlikely to be ever release as an end user system.

Yup, no doubt what they're planning to or actually running in their own datacenter(s).

M6 How Big of a Jump Are We Looking At?

macrumors 68000

macrumors 6502a

macrumors 68000

macrumors 68020

macrumors 68020

macrumors 68020

macrumors Core

macrumors Core

macrumors 68020

macrumors Core

macrumors G4

macrumors G4

Contributor

macrumors 68020

macrumors Core

Contributor

macrumors Core

macrumors 68030

macrumors regular

macrumors G4

macrumors Core

macrumors G4

Our Staff