Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Apple could treat the Mac Pro the way the Volkswagen Group treats development of the Bugatti lineup — a loss leader that attracts attention and spurs the development and testing of new, cutting edge technology.

Yes, but economies of scale make that impractical.

That used to be how it works — things start at the Power Macintosh and trickle down to the LC. But now, most new SoC features get introduced at the A-level (iPhone), M-level (iPad/most Macs), or M Pro-level (higher-end Macs). For example, the M1 Pro introduced better video codecs that eventually trickled down, and were of course also added to the M1 Max and M1 Ultra.

Building a Bugatti factories is several orders of magnitude cheaper than building the factory capable of producing tomorrow's chips. Apple is able to have TSMC (and others) manufacture such impressive chips because Apple has the volume. 200 million iPhones a year, plus tens of millions more iPads, Macs, Apple Watches, all on the same basic building blocks.

So you'd have to make it up in volume, and the Mac Pro does not have that volume. It never will again. It's like offering customers the Apple CRT Display. Almost of all of them will be baffled why they would even want that.

But, as @CWallace says, that could change with heterogenous process nodes. If portions of the M5 Extreme are manufactured with the same process as the M5 Max, but then a fancy new feature is manufactured with a worse, larger efficient process, and enough people buy the whizbang feature for Apple to still benefit (such as by getting real-world data on how it's used) and eventually trickle the feature down to other devices, that might be an option.

This sort of work would grant them supremacy in chips. There would be no advantage (except maybe cost) in choosing a machine with an AMD Threadripper over a Mac with the flagship chip — especially with the ARM versions of Windows and Linux that now exist.

There's plenty of market segments Apple has no interest for, such as putting a Mac in a data center. You'll be better served with other vendors there.

It’d be great if Apple could revive Bootcamp, too. I prefer macOS, but there are certain applications and certain tasks that still call for either Windows or Linux. I’d prefer to run it all on the same hardware.

Asahi Linux does exist, although I believe they don't have the M4 running yet.

As for Windows, I believe the challenge, drivers obviously aside, lies mostly in the boot process. Windows's hardware abstraction layer (HAL) assumes either BIOS or UEFI (presumably, the Windows NT versions for PowerPC also supported OpenFirmware?); most ARM devices offer neither. For Qualcomm Snapdragon, Qualcomm has added an UEFI-lite layer of sorts that's just barely enough to get Windows to boot, but, as Linux developers have found, isn't enough to treat them as UEFI-capable chips. For Apple Silicon, iBoot is similar to OpenFirmware, like on their PowerPC Macs of the 1990s and 2000s.

Someone would have to adapt this, and neither Apple nor Microsoft seems to care enough.

What's in it for Apple? A bunch of people who say they'd buy a Mac if they had this additional ability. Sure. But that's not a whole lot of people. Most of them can just buy a second device.

What's in it for Microsoft? More Windows users, which hasn't been a growing business for them in a decade.

So why would either of them do it?
 
Speculation from, I believe Gurman, was that the "M1 Extreme" used two M1 Ultra / four M1 Max using dual Ultrafusion interposers but the yields were so miserable that it would have been a $10,000 BTO option.
Most likely “Jade 4C” was a CoWoS design using two Ultras. Nvidia succeeded in doing something similar with Blackwell, two years later. Apple could have also persisted on a similar developmental timeline, but they chose not to build it. $10,000 would be a low estimate for that Mac Pro, at least twice that seems more like it. Unlike Nvidia’s product line, Apple doesn’t sell $500,000 machines that absorb the development cost.
Now with the new TSMC SoIC-mH packaging technology, this allows Apple to design SoCs with independent compute and graphics sections so this could allow Apple to scale both to extremely high core-counts while keeping yields high enough to not price them out of reach for almost anyone.
Private Cloud Compute may change that equation. Apple has a solid reason (PCC requires Apple silicon) to build the “4C” now, and SoIC (assuming the rumor is accurate) may allow for Max variants that were not possible via SoC. So it looks like it’s possible they could get it down to $10,000, which I think would be the highest base price they might consider for an Extreme Mac Pro, but I’ll guess the target would be more like $8,000.

[Sorry, I made some changes, not realizing I had already posted this comment. I have no idea at what point I originally posted it…]
 
Last edited:
  • Like
Reactions: CWallace
There will not be, which is why M4 Max and M3 Ultra coexist as options.
Probably M4 was designed prior to the ability to run non-trivial open-weight models locally, so the maximum amount of RAM might not have been a priority. If M5 Ultra ups the maximum RAM from 512GB, it might be able to run a non-quant model without having to cluster M3 Ultras. That market readily exists.
Screenshot 2025-11-10 at 9.50.39 AM.png
 
Last edited:
  • Like
Reactions: rb2112
Probably M4 was designed prior to the ability to run non-trivial open-weight models locally, so the maximum amount of RAM might not have been a priority. If M5 Ultra ups the maximum RAM from 512GB, it might be able to run a non-quant model without having to cluster M3 Ultras. That market readily exists.
View attachment 2578043
There is a growing market for having tons of RAM for those who just want to run these models locally. Funny how that would get people to buy huge RAM macs that they would have never been willing to in the past.

quick point, the caption in the picture states "full 8-bit", which is not "a non-quant model" as you mentioned, but EXO Labs are probably just clarifying that it is not some Q6-8 mix or something. The full models are 16 bits.
 
there will be (an M4 Ultra)

If there is, it will be because TSMC does not have the fab capacity to make an M5 Ultra in 2026.

My guess is that Apple will wait until WWDC 2026 to announce the Mac Studio and Mac Pro so they can announce the M5 Ultra at the same time (and shipping for that configuration may be later in 2026).
 
quick point, the caption in the picture states "full 8-bit", which is not "a non-quant model" as you mentioned, but EXO Labs are probably just clarifying that it is not some Q6-8 mix or something. The full models are 16 bits.
Technically you are right, but bear in mind the full DeepSeek 671B (parameters) was trained using mostly fp8 (see original paper) with some weights in fp16/32. This was considered a breakthrough in training efficiency.
Screenshot 2025-11-10 at 11.43.53 AM.png

The released open weights were in bf16 format (1.34TB), looking at DeepSeek-R1-GGUF on HuggingFace. Quantizing that down to Q8_0 (731GB) probably makes little difference in the benchmarks, but the key point for M5 Ultra is that this might fit in one M5 Ultra with improved maximum RAM, instead of requiring two M3 Ultras. Would be way cheaper too I would guess. Cost of a 512GB M3 Ultra with 2TB SSD is $9,900. Two of them is 20K.
 
  • Like
Reactions: rb2112
If there is, it will be because TSMC does not have the fab capacity to make an M5 Ultra in 2026.

There isn’t really enough data to establish a pattern, but if we just compare Ultra to no-suffix release dates, M1 Ultra launched five quarters after M1, M2 Ultra four quarters, and M3 Ultra five quarters.

If we see M4 Ultra in Q2/26, that’ll be unusually late at eight quarters after M4 — but if we see M5 Ultra that quarter instead, it’ll be unusually early at just two quarters after M5.
 
  • Like
Reactions: tenthousandthings
I wonder if Apple would move 512+ GB RAM option to Mac Pro only to encourage LLM bros to buy these giant towers instead. Or would the appeal of having a small form factor box running 671b models be a dealbreaker for a lot of them?
 
  • Like
Reactions: tenthousandthings
There isn’t really enough data to establish a pattern, but if we just compare Ultra to no-suffix release dates, M1 Ultra launched five quarters after M1, M2 Ultra four quarters, and M3 Ultra five quarters.

If we see M4 Ultra in Q2/26, that’ll be unusually late at eight quarters after M4 — but if we see M5 Ultra that quarter instead, it’ll be unusually early at just two quarters after M5.
Agree that there really isn’t enough data. Nevertheless:

~ M1 Ultra launched 5 months after M1 Pro/Max
~ M2 Ultra launched 5 months after M2 Pro/Max
~ M3 Ultra launched 5 months after M4 Pro/Max (one year five months after M3 Pro/Max)

The most interesting part of that is the M4/M3 pairing, which still conforms to the pattern even though M3 Ultra must have existed when M4 Max first launched. So the five-month lag is not about silicon development. My guess is it’s about production volume of the Max. Due to either cost or capacity (or both), Apple can’t produce enough volume to supply both MacBook Pro and Mac Studio at first launch.

So if M5 Pro/Max launches in January 2026, then M5 Ultra would launch in June 2026. However, if the five-month lag is about production volume, then it’s possible the delay to January could help alleviate that. But I think the delay to January is most likely to be about silicon development, specifically the transition to SoIC. If so, then the five-month lag would still apply.
 
Last edited:
  • Like
Reactions: chucker23n1
I wonder if Apple would move 512+ GB RAM option to Mac Pro only to encourage LLM bros to buy these giant towers instead. Or would the appeal of having a small form factor box running 671b models be a dealbreaker for a lot of them?

Seems a counterproductive move to me to do so. The only benefit the Mac Pro has over the Mac Studio is cooling (marginal) and the ability to plug in non-GPU PCIe cards, which is of no use to current AI workloads.

Now admittedly for people doing this kind of work seriously, price likely ranks low down on the list of priorities. However, when you consider a Mac Studio M3 Ultra with 512GB of RAM and 8TB SSD costs $100 less than a Mac Pro M2 Ultra with 192GB of RAM and 8TB SSD, the value proposition certainly favors the Studio.
 
I wonder if Apple would move 512+ GB RAM option to Mac Pro only to encourage LLM bros to buy these giant towers instead. Or would the appeal of having a small form factor box running 671b models be a dealbreaker for a lot of them?
Nvidia’s approach to this is interesting, with the same two basic form factors for “LLM bros” (running a custom Ubuntu DGX OS). They have the DGX Station (784GB coherent memory) tower and the DGX Spark (128GB coherent memory) box. [Setting aside the third-party versions of these, e.g. Dell, HP, et al.]

Seems to me the Spark is aimed at that “small form factor box” appeal, even though it is sort of a beta product at this point. They are designed to be doubled up, so if the second generation Spark gets 256GB then a 2x configuration would reach 512GB.

I think the idea is 2x DGX Spark = 1x Ultra Mac Studio.
 
Last edited:
  • Like
Reactions: rb2112 and CWallace
I wonder if Apple would move 512+ GB RAM option to Mac Pro only to encourage LLM bros to buy these giant towers instead. Or would the appeal of having a small form factor box running 671b models be a dealbreaker for a lot of them?

For now, Mac Studio RAM configurations have been constrained by what’s physically possible with Apple’s layout. They put RAM chips around the SoCs.

So, RAM configurations have increased mostly because density has.

Of course, they could break with that policy the next time density increases, leave the Studio at 512, and offer a 768 GiB Mac Pro.

But I don’t think they will, because I don’t think “we really want Mac Studio buyers to move upmarket to the Mac Pro” is a problem Apple actually has. Rather, they’re unsure there is much of a useful market segment left for the Pro. I think they’re not going to artificially fill that. If anything, they’d rather the Studio can fill in for more and more use cases.
 
For now, Mac Studio RAM configurations have been constrained by what’s physically possible with Apple’s layout. They put RAM chips around the SoCs.

So, RAM configurations have increased mostly because density has.
I guess maybe SoIC stacking could save some space, especially when doubled in the Ultra.
Of course, they could break with that policy the next time density increases, leave the Studio at 512, and offer a 768 GiB Mac Pro.

But I don’t think they will, because I don’t think “we really want Mac Studio buyers to move upmarket to the Mac Pro” is a problem Apple actually has. Rather, they’re unsure there is much of a useful market segment left for the Pro. I think they’re not going to artificially fill that. If anything, they’d rather the Studio can fill in for more and more use cases.
The base price and storage capacity of Nvidia’s DGX Station GB300 with 784GB unified system memory (288GB GPU + 496GB CPU) is still not known.

The marketing is worth a laugh, as Dell manages to squeeze all three of Apple’s advanced tiers into the name for their DGX Station implementation: “Pro Max with GB300 Ultra Superchip

I think it remains to be seen if Apple will take the bait. To compete directly with the GB300 Ultra, they would need to do an “M5 Supra” (I strongly dislike “Extreme”) combining two M5 Ultra. But I think they’ve already rejected that, and I don’t think SoIC changes that equation.

That said, there is another way they could cater to this emerging “AI PC” market. One of the principal features of the DGX workstations is Nvidia’s Connect-X networking, which is derived from their server arrays. Apple may have partnered with Broadcom for this kind of thing:

 
  • Like
Reactions: chucker23n1
I guess maybe SoIC stacking could save some space, especially when doubled in the Ultra.

Wouldn't that be inside the SoC, rather than on the package (where the memory resides)? Or are we talking a chiplet approach where the package contains multiple "chips"?

The marketing is worth a laugh, as Dell manages to squeeze all three of Apple’s advanced tiers into the name for their DGX Station implementation: “Pro Max with GB300 Ultra Superchip

Yeah, Dell has gone heavy on using Apple's suffixes, with the exception of "Premium" and "Essential". But their laptop line-up sounds very Apple-like. Almost too generic. (Of course, in classic Dell fashion, it isn't even quite clear how thorough their clean-up was. I'm still seeing "Latitude", "Inspiron" and "XPS" among the options. Are those… purely for old models?)

(I strongly dislike “Extreme”)

It was a little tongue-in-cheek when they did things like Quartz Extreme, and I think that was the intent, too.

I'm actually unsure when they've last used that suffix.

combining two M5 Ultra. But I think they’ve already rejected that, and I don’t think SoIC changes that equation.

That said, there is another way they could cater to this emerging “AI PC” market. One of the principal features of the DGX workstations is Nvidia’s Connect-X networking, which is derived from their server arrays. Apple may have partnered with Broadcom for this kind of thing:


I'm leaning towards that's too for outside their wheelhouse.
 
The released open weights were in bf16 format (1.34TB), looking at DeepSeek-R1-GGUF on HuggingFace. Quantizing that down to Q8_0 (731GB) probably makes little difference in the benchmarks, but the key point for M5 Ultra is that this might fit in one M5 Ultra with improved maximum RAM, instead of requiring two M3 Ultras.
I realized when looking at these (and other) models that are in the 100s of GB, that

(a) it might indeed mean something to a developer to quantize a model to fit on a specific machine but

(b) more likely, models will just keep being put out with less regard to which Mac can run them. These models can also be run by the far greater compute resources available from online providers via subscription (like Groq, OpenAI, perplexity, etc).

Caveat: I do not have a subscription to any of these services so maybe they don't even allow use of the larger models. New models emerge much faster than the mac upgrade cycle.

side note: These folks made a non-consumer solution for DeepSeek R1 671b FP16:
 
side note: These folks made a non-consumer solution for DeepSeek R1 671b FP16:
Yes, but see the comments there.
Screenshot 2025-11-12 at 10.35.29 PM.png

servethehome aren't AI experts. Here's what an expert at HuggingFace did.
https://www.notebookcheck.net/Way-t...thout-expensive-GPUs-discovered.956421.0.html
 
  • Like
Reactions: rb2112
Yes, but see the comments there.
View attachment 2579062
servethehome aren't AI experts. Here's what an expert at HuggingFace did.
https://www.notebookcheck.net/Way-t...thout-expensive-GPUs-discovered.956421.0.html
Thanks for sharing the fact that the full DeepSeek R1 is at 8 bit ! Newest model info (R1-0528, ~715GB here: https://docs.unsloth.ai/models/tuto...-locally#how-to-run-full-r1-0528-on-llama.cpp).

btw I never implied STH were experts, I simply showed "a way" that someone had run a model, as I like to see different setups, and am glad you showed "a way" with the HF setup.
 
  • Like
Reactions: chars1ub0w
~
Wouldn't that be inside the SoC, rather than on the package (where the memory resides)? Or are we talking a chiplet approach where the package contains multiple "chips"?
SoC is horizontal 2D integration, while SoIC is both horizontal and vertical 3D integration. It’s been around for a while, and it has been improving exponentially — the first generation of TSMC-SoIC (2020) was a 10x improvement over earlier 3D-IC, while the next-generation technology has been referenced as “SoIC+” and is said to be a 1000x improvement (like everything else, it’s about density) — I think memory is part of it, and yes it is a chiplet approach.
[…] I'm leaning towards that's too far outside their wheelhouse.
Yes, absolutely, but maybe not too far outside Broadcom’s wheelhouse? They are a steering member for the Ultra Ethernet Consortium.

800 Gb/s (Nvidia’s current high-end ConnectX-8 spec for their SuperNIC networking cards, with 1600 Gb/s expected for the next generation ConnectX-9) would be the target. If Nvidia can fit one of those inside the DGX Spark (currently ConnectX-7 400 Gb/s), then Apple could follow the same pattern, with optional 400 GbE in the Mac Studio, and an optional 800 GbE networking card for the Mac Pro.
 
Last edited:
I think the biggest issue with the Ultra (and, by extension, a 2X Ultra) is that the performance is not scaling well with the M1-M3 models. Clearly you are not going to see a doubling of performance, but an M3 Ultra is around one-third faster in multi-core than a single M3 Max.

Ultra is significantly faster in Metal Geekbench than Max (around two-thirds faster), so with the M5 GPU cores adding Neural Accelerators to support AI, an M5 Ultra might end up being a pretty powerful AI machine, especially if Apple can increase the supported RAM pool.
 
  • Like
Reactions: Tagbert
I think the biggest issue with the Ultra (and, by extension, a 2X Ultra) is that the performance is not scaling well with the M1-M3 models. Clearly you are not going to see a doubling of performance, but an M3 Ultra is around one-third faster in multi-core than a single M3 Max.

Yes, and I'm getting conflicting data on whether this has been improving.

In Xcode 16, the M1 Ultra was about 39.45% faster than the Max, the M2 Ultra 44.44% faster, and the M3 Ultra 48.15% faster. (I'm getting a less conclusive picture in Xcode 15 and 13-14.) So that seems to be a steady improvement.

Geekbench 6 Multicore, meanwhile, suggests it's been getting worse: 48.41% for the M1 Ultra, 44.52% for the M2 Ultra, and 36.04% for the M3 Ultra.

But then of course, Geekbench 6 is designed to penalize high core counts.

 
  • Like
Reactions: CWallace and jido
Hopefully, the M5 Ultra can be ordered with more than 512GB of RAM. 512GB isn't quite enough for an open-weights, non-quant model like DeepSeek R1. Then it could be used as standalone for LLM duties without compromise.
While this statement is true, at any time someone can come up with n-billion model that fits just outside current machine limits (128GB, 512GB, etc), and when that happens, for that model, there would be compromises. Expect LLMs to come and go and get larger with a complete lack of synchronization with introduction of new, higher top end RAM models like the M5U under speculation).

I think it is more likely any LLM creator would try more to fit their model under a standard VRAM limit for video cards that are much more common than the M3U (like 8, 12, 16, 32GB) in the 5060, 5070 etc. or other cards.

I would urge anyone that gets more than 512GB just to run any 1 particular model (the example you gave is a good one, DeepSeek R1 Q4 vs Q8) must really justify the improvement in performance of that full DeepSeek vs the Q4. It is possible that model isn't even the best one for a particular task/query. It may be worth paying for a cloud providers one month or enough tokens to test out a slew of different models for your workflow.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.