Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I believe it’s a CoWoS-L package, with a single Blackwell Ultra GPU paired with a single Grace CPU, so my entry-level pricing is off — Nvidia’s nomenclature is confusing (and it changed recently),

The 'Blackwell' part is not on the Grace Package for the workstation.

"...
Despite sporting a NVIDIA Grace Arm-based CPU and a NVIDIA Blackwell GPU, the platform requires installing another GPU to get video output.
...

NVIDIA DGX Station GB300 Edition Launched Without a GPU​

Instead of selling the DGX Station as a NVIDIA-only product, the company has made a standard(ish) motherboard and plans to sell it through OEMs like Dell and HP.
..."

I was a bit off in the post above. I forgot the datacenter AI compute GPU doesn't have any display engine. So would need a medium/light-weight to use a 'deskside' , personal workstation with a GUI as mainly a "terminal" to the compute.


as I understand it, the silicon shown there should be called the GB100, not GB300. The GB200 is two Blackwell Ultra paired with one Grace, the GB300 is four Blackwell Ultra paired with two Grace.

We know the GB200 uses HBM (above link has photo and specs), but if the GB100 (like the GB10 in the Spark) uses LPDDR then it’s going to be come out around what we might expect for an M5 Extreme Mac Pro…

Just the GB10 Spark uses LPDDR5 as a unified pool. The "workstation" product is just using the next generation server parts structure for the main components of the board. The CPU and GPU are split memory pool implementations.



P.S. Again if Apple is trying to build a hyper focused AI server implementation, they are likely going to end up similar place as Nvidia and there is no video out on the primary logic board. That is pretty unlikely to help the Mac Pro. (which is primarily sold as a GUI driven , single user , workstation ).
 
Last edited:
  • Like
Reactions: tenthousandthings
I'm one of those few people who have a 2019 Mac Pro with maxed out MPX modules and just recently a base 2025 Studio M3 Ultra. I've was using the Mac Pro for 3D rendering and whilst I have a PC with a couple of Nvidia cards and it's quicker than both of the Macs, I just prefer using Mac OS.

The fact is those, this studio runs rings around the MacPro for straight performance. It also generates way less heat and is silent, noting my PC can claim. Yeah there are still some speciality audio cards but the majority of audio interfaces are still USB-C, there are only a few Thunderbolt interfaces. PCIe audio are pretty scarce.

Blow are my results. Unless they come out with some M4/M5 Ultra/Extreme chip with PCIe slots I can't see the point of it as much I hate to say it.

2019 MacPro with 2x 6800XT Duo's
Self Build PC with 1914900KF + 4070 Ti Super + A4000
Mac Studio M3 Ultra Base 28cpu/60gpu

Redshift Benchmark:
MacPro: 2:30
Studio: 2:03
PC: 1:37

Cinebench Multicore CPU
MacPro : 1169
Studio: 2666
PC: 2016

Rendering an actual scene in Cinema 4D and Redshift. The scene has a lot of translucent objects, think trees, grass etc so not the easiest thing to render. Output was at 3000px X 3000px.

Mac Studio 19:12 RTX ON
Mac Pro 2019 41:48
Windows 11 + 4070 Ti Super 14:58
Windows 11 + 4070 Ti Super + A4000 10:21
Hey there! First post here as I'm finally reconsidering the Apple ecosystem after ditching it 8 years ago. Alright... Blender/Zbrush/3D Coat/Substance Suite are all my bread and butter and I'm quite familiar with all the Windows/PC compatible hardware on the retail and prosumer side of things. Anyhow, all that said... Mac Studio Pro and the potential of an expandable 2025 Mac Pro (assuming that's a thing) are starting to appeal to me with how the unified memory works between the CPU and GPU. I'm all for sacrificing render speed for shared memory and I easily breach 128+gb on the CPU and 48+ on the GPU...

Soooo with that in mind, do these software packages distribute memory well within the newer Apple ecosystem? I think C4D should be close enough and I figure you have a solid idea here. I am looking at the 512gb systems for context. Thanks in advance.
 
Hey there! First post here as I'm finally reconsidering the Apple ecosystem after ditching it 8 years ago. Alright... Blender/Zbrush/3D Coat/Substance Suite are all my bread and butter and I'm quite familiar with all the Windows/PC compatible hardware on the retail and prosumer side of things. Anyhow, all that said... Mac Studio Pro and the potential of an expandable 2025 Mac Pro (assuming that's a thing) are starting to appeal to me with how the unified memory works between the CPU and GPU. I'm all for sacrificing render speed for shared memory and I easily breach 128+gb on the CPU and 48+ on the GPU...

Soooo with that in mind, do these software packages distribute memory well within the newer Apple ecosystem? I think C4D should be close enough and I figure you have a solid idea here. I am looking at the 512gb systems for context. Thanks in advance.
Hey and welcome.

I just got the base M3 Ultra with 96Gb. With what I'd think as a regular scene, no simulation or anything when rendering in Redshift memory usage would get up to 86Gb or so. Most of the time C4D would release the memory but sometimes not. Either way I haven't had any issues at all with the machine. I actually prefer using Arnold than Redshift and memory is much less of an issue there. I think you'll be just fine.
 
  • Like
Reactions: Zeiram3f
The 'Blackwell' part is not on the Grace Package for the workstation.

"...
Despite sporting a NVIDIA Grace Arm-based CPU and a NVIDIA Blackwell GPU, the platform requires installing another GPU to get video output.
...

NVIDIA DGX Station GB300 Edition Launched Without a GPU​

Instead of selling the DGX Station as a NVIDIA-only product, the company has made a standard(ish) motherboard and plans to sell it through OEMs like Dell and HP.
..."

I was a bit off in the post above. I forgot the datacenter AI compute GPU doesn't have any display engine. So would need a medium/light-weight to use a 'deskside' , personal workstation with a GUI as mainly a "terminal" to the compute.

Just the GB10 Spark uses LPDDR5 as a unified pool. The "workstation" product is just using the next generation server parts structure for the main components of the board. The CPU and GPU are split memory pool implementations.

P.S. Again if Apple is trying to build a hyper focused AI server implementation, they are likely going to end up similar place as Nvidia and there is no video out on the primary logic board. That is pretty unlikely to help the Mac Pro. (which is primarily sold as a GUI driven , single user , workstation ).
Thanks. This has to be one of the subtlest marketing/nomenclature distinctions I've ever seen.

The one-word difference between [1] the "GB300 Grace Blackwell Ultra Superchip" used for a building block for GB300 NVL72 rack-scale servers and [2] the "GB300 Grace Blackwell Ultra Desktop Superchip" used for the DGX Station is quite large: the server has quadruple the GPU and double the CPU of the workstation.

Nvidia refers to the GB300 memory as (up to) "784GB of large coherent memory," (288GB HBM3e + 496GB LPDDR5X) so I assume the purpose-built DGX OS must see it as something like Apple's unified memory architecture. Is that a safe assumption?

With regard to the Mac Pro, I think I'd expect Apple to blend the two. That's beauty of Ultra = 2x Max. Package an Ultra with a building block created for the AI servers and voila, best of both worlds!

What are the odds? I don't know, but not super high. Still, if you watch that Nvidia keynote, where he introduces Vera Rubin and its second generation, Rubin Ultra, it feels like Apple needs to make some kind of statement, hardware-wise.
 
Thanks. This has to be one of the subtlest marketing/nomenclature distinctions I've ever seen.

The one-word difference between [1] the "GB300 Grace Blackwell Ultra Superchip" used for a building block for GB300 NVL72 rack-scale servers and [2] the "GB300 Grace Blackwell Ultra Desktop Superchip" used for the DGX Station is quite large: the server has quadruple the GPU and double the CPU of the workstation.

First, "SuperChip" shouldn't be though of as a 'chip' or a 'chip package'. Pragmatically it is really a logicboard ( printed circuit board; PCB). One thing that Nvidia is trying to do is claim they are not in the systems maker busines. That they are still just a major component vendor, that isn't trying to compete with the major system sellers (HPE, Dell, Supermicro , etc. )

The Nvidia C2C-Link is essentially a link via a PCB to to 'Chip' packages. (C2C .... chip to chip ).

Some terminology would tag it as "multiple chip module" , but "SuperChip" sounds snazzier.

This NVL72 "Superchip" is more akin to the size of a main logic board.


cordelia-gb300-compute-board-1920-1080.jpeg


The assertion that is some kind of "chip" where there are screws holding down subcomponents , networking sockets, cylindrical capacitors , etc. it is rally a 'board" or at best "module". That is huge. ( and also power consuming past normal wall socket circuit capacity; at least USA codes. )

The "GB 300" prefix is really about there just being a "Grace" package in the mix along with some "Blackwell" silicon. A guide from back on GB200 era (like that was long ago. :) ).


The problem is that logic board is really too big for a personal workstation, if going to normal stuff that goes on a workstation (e.g., local storage, a couple of general PCI-e slots. , regular workstation I/O sockets). The above is more so designed to put two 'nodes' on teh same board. (where a node is Grace+ 2Blackwells. ). It is a building block aimed at a different scale. The "GB 300" is really only about the basic node structure there, no the whole board.

The workstation doesn't have two nodes. Also trims back the Blackwell parts also as both cost and space saving.
However, has enough in common with the larger system so local development work should scale straightforwardly to a cluster. These 'dev' boxes don't have to sit in a datacenter with high end HVAC.



Nvidia refers to the GB300 memory as (up to) "784GB of large coherent memory," (288GB HBM3e + 496GB LPDDR5X) so I assume the purpose-built DGX OS must see it as something like Apple's unified memory architecture. Is that a safe assumption?

Coherent is more suggestive that have a 'flat address' space. ( technically coherent means the changes made to memory are seen by everyone who might have a copy. But for everyone to know is shared they all need a common label; hence common addresses. ) That doesn't necessarily mean "uniform". Apple's 'Unified' has a substantive 'uniform' assumption built into it.



With regard to the Mac Pro, I think I'd expect Apple to blend the two. That's beauty of Ultra = 2x Max. Package an Ultra with a building block created for the AI servers and voila, best of both worlds!

Not much "chiplet" beauty there. Once you have coupled to 'Max' dies to each other , you have basically used up all of the highspeed , low latency connections ( at least if competing with C2C-link type of performance) . You might be able to take one Max and couple it to something else that wasn't laptop optimized , but that is it.

Nvidia's data center GPUs have both C2C-Link and NVLink connectors. So they are purpose built to "scale up" on the same logic board. (e.g., the two nodes on the NVL72 board can both talk to each other in additional to other GPUs on other boards in the same cabinet.


What are the odds? I don't know, but not super high. Still, if you watch that Nvidia keynote, where he introduces Vera Rubin and its second generation, Rubin Ultra, it feels like Apple needs to make some kind of statement, hardware-wise.

Apple's AI solution doesn't have to outperform Nvidia's in raw compute. It just has to be cheaper ( more affordable within the power consumption parameters Apple wants to constraint it too.) Apple isn't going to sell the hardware to anyone else so what meaningful 'statement' could they possibly make? The point is to not write billion dollar checks to Google** or Nvidia ( or OpenAI or Microsoft or etc. ). Apple only needs a 'statement' if trying to make other folks change their Nvidia data center buying habits. There is little indication Apple is chasing that at all.


Extremely likely Apple is going to be looking a solution that gets better Perf/Watt than Nvidia's does. (e.g., C2C-link costs more in power than UltraFusion does. )

If Apple 'sells' anything it might be for the AI cloud service , but isn't a 'hardware' sale. So far Apple saying it is all free... so not much 'statement' making there either. ( and if 'free' then all the more likely that the power consumption bill will matter at least as much as benchmark bragging rights. )




** Some reports are that Apple is using Google Cloud services and Tensors to do substantive parts of the training. Apple's AI server chipls likely will focus on 'inference' for Apple's service cloud , if they are reasonably decent at training also that would likely be a candidate to offload from outsourcing also (if cheaper).
 
Last edited:
Many people don’t quite seem to get that Apple is not and will never be a merchant silicon vendor. The Mac Pro was designed to accommodate merchant silicon vendors like AMD, Intel, and Nvidia.

The question, then, is what product will the Mac Pro become, now that the old product no longer exists?


I don’t think “forced” is going to turn out to be an accurate description of what’s going to happen. Yes, there will be both an Ultra Mac Pro and an Ultra Studio, but the heavyweight Mac Pro will be something else.

Look at Nvidia’s DGX Spark (formerly Project DIGITS) and DGX Station. The Spark, starting at $3,000, runs parallel to the Mac Studio. The Station, probably starting at something like $75,000 with GB200 inside and $150,000 with GB300 inside, is something else.

With the introduction of SoIC bringing a different, more flexible architectural approach to the M5 Pro/Max and, by extension, the M5 Ultra, an M5 Extreme Mac Pro could see configurations starting at $15,000 (2x M5 Ultra, parallel to GB200, which is 2x Blackwell) or even $30,000 (4x M5 Ultra, parallel to the GB300, which is 4x Blackwell Ultra), marketed like the DGX Station, as AI server hardware in a workstation.

The potential reminds me of the Pro Display XDR. Far less expensive than the best professional reference displays, but triple the best (including their own) standard displays.

And, yes, I know it’s fantasy, but I find all of the wallowing in the past here to be a bit shortsighted…
A product like this would mean a stronger push by Apple into the enterprise and I just don't see it happening at this point when it comes to selling dedicated hardware units.

With the introduction of the studio I think it's better off to just kill the Mac Pro when it no longer makes sense for Apple to produce them. Considering what happened after 2013 I think there had to be an internal struggle at Apple at what to do with that product all those years ago. Hell it took them a little over 3 years to admit the design failure of the trash can to think of a redesign given that the demand that still existed in 2017.

Not sure that demand is there anymore in 2025 TBH
 
Could TSMCs SOIC COUPE (Silicon Photonics into the SOIC) provide Apple with a (vanishingly) low latency way that allows multiple Mac Studio sized machines to coherently connect & scale performance?
+) Gen 1 : 200GB/s per transceiver : In 2025
+) Gen 2 : 800GB/s per transceiver : By EOY 2026
+) Gen 3 : 1.6TB/s per transceiver

https://images.anandtech.com/doci/21373/TSMC-3D-Optical-Engine.png

If a 4 fibre Gen 2 connector could allow 4 machines to all connects point to point approaching the current M3 Ultra; It would allow uses to buy up to 4 x Studios to allow performance to scale to 4 x Ultras (point to point).

It could give Apple a steady stream of sales even of non-refreshed chips for users wanting workstation level performance (media encoders / GPU / NPU / CPU / memory) without having to throw away their initial machine.

It could also be a stepping stone towards a 2028 Data Centre solution with a proprietary Apple fabric and switch.

Hidra + Baltra + Broadcom + UALink = ?

Even if Apple’s server initiative is mostly about their own requirements (+Mac Pro), the ingredients should ring a bell for Nvidia. The Next Platform’s argument about them implied they were ignoring cost effective performance gains, because they could (and sell more units):

Easy money does weird things to companies and their roadmaps. - The Next Platform

UALink (+Fabric) is the next bandwidth and latency barrier getting removed by an industry wide standardisation effort for package-to-package communication. Apple may not be interested in Nvidia’s market, but the cumulative effect of the new Standards is that they're opening the door for others into Nvidia’s space.

The IP built on the new die-to-die Standards is expected to start rolling out this year. UALink is a little less clear (2026 for fast implementation), which fits with The Information’s leak on Apple’s Broadcom 'arrangement'. Apple in UALink clues us into the fact that they're are working on scale-up for their server Tier A/Si, which may also equal a more traditional (expandable) Mac Pro with Accelerators, within the A/Si / unified memory paradigm.

Co-Packaged Optics (CPO), as the Fabric for connecting Accelerators is likely why Apple is using Broadcom. PCIe Gen 7 is also supposed to bring capacity to UALink. Maybe that’s all a standalone product would require, though, with Apple selling the Mac Pro as a rack option, CPO / UALink enabled accelerators seems possible.

The 2xMax (Ultra) Tier is a fit for the Studio’s constraints (or form factor). Apple has more or less confirmed this Tier will continue with the cadence announcement, but adding CPO to a package where (for the most part) it would probably go unused seems unlikely. Especially if they have a server Tier on the way. Beyond whatever Apple sees suitable in terms of demand etc, I'd assume the cadence is a way for them to balance their engineering resources around an additional A/Si Tier. The plan, perhaps, is to weave resources between the Ultra and Server Tiers with staggered or alternating generations for the two.

As always, this is pure speculation on scant, but (hypothetically) consistent details:

Apple is in the UALink Consortium.

• Two codenames remain (Hidra+Baltra) which fits the Host+Accelerator pair described in UALink’s scale-up model.

• Plus the “arrangement” with Broadcom, as Gruber described it, from The Information leak:
Apple is developing its first server chip specially designed for artificial intelligence [. . .] Apple is working with Broadcom on the chip’s networking technology, which is crucial for AI processing [. . .] If Apple succeeds with the AI chip — internally code-named Baltra and expected to be ready for mass production by 2026 – it would mark a significant milestone [. . .] - Daring Fireball
https://daringfireball.net/2024/12/information_aside_double_ultra_scrapped

UALink Scale-up.jpg


Hidra+Baltra works in this context and based on Anand Shimpi’s outline of Apple’s approach to AI:
[. . .] when you’re going to execute these machine learning models, performing these inference-driven models, if the operations that you’re executing are supported by the neural engine, if they fit nicely on that engine, it’s the most efficient way to execute them. The reality is, the entire chip is optimized for machine learning, right? So a lot of models you will see executed on the CPU, the GPU, and the neural engine, and we have frameworks in place that kind of make that possible. The goal is always to execute it in the highest performance, most efficient place possible on the chip. - A. Shimpi

. . . it’s possible they could build both the Host and Accelerator using the same core CPU+GPU+NPU chip. Just configured on two SiP types by either function using different (disaggregated) I/O chips. A basic mock up (based on Eliyan) would be something akin to this . . .

Hidra+Baltra.jpg


The other assumption in the image above is that the Memory Controllers would move onto another die; i.e. the base die of a custom HBM or similar. This combined with I/O etc would reclaim more than 100 mm2 of the Max’s ~520 mm2 die area. If they grow this hypothetical (consolidated) server chip to ~650-700 mm2, then they'd be approaching the core count of the dream-dream Extreme in a 2x configuration.

Whatever Apple has in progress, it is presumably the “more to come” for the M-Family of chips that Johny Srouji teased after WWDC 2023. The first iteration of Apple’s server class silicon would fit that bill and it fits with Srouji’s need to say something at that time; i.e. in the wake of the M2 Ultra / stopgap Mac Pro.


Anyway, there’s not a lot to find on UALink yet, other than it is a Fabric agnostic (within spec.) memory operations protocol for the rest of the industry that . . .

Broadcom is likely the big winner [of] as it is positioned to be the connectivity provider to non-NVIDIA systems, whether that is for scale-up or scale-out. - STH

https://www.servethehome.com/ualink...-backed-by-amd-intel-broadcom-cisco-and-more/

https://www.forbes.com/sites/moorin...g-into-the-ultra-accelerator-link-consortium/

https://www.broadcom.com/company/news/product-releases/61946

https://www.broadcom.com/info/optics/cpo

If you find anything interesting, post it.
 
Last edited:
If they can’t go Extreme, it probably isn’t worth buying anyways. Now, give me an M5 Extreme not interconnected but all in one package, and 1TB of unified memory and it’s an instant buy for me. Just bought the M3 Ultra Mac Studio with 512GB of RAM. Really wish this is what I had instead.
hey- what are you using the studio for? I debate on the 512 ram but not so sure how it will perform with ML and VFX..and if better to look at RTXs.. thanks
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.