Where Does Mac Pro Go Next After M4 Ultra?

SlaveToSwift · Feb 25, 2025

phrehdd said:
Maybe a bit simplistic - rework the paradigm so that yes, the M4 or M5 can offload specific tasks to "modules" that are dedicated functions - items likely to see benefit - various video work, AI, 3D rendering and more. This would be what sets it apart from the Studio - ability to add modules or at least order them at purchase. Any activity that will take more than a few moments should be offloaded.

Yea, the days of custom built workstations are long gone for good reason. No one wants to pay tens of thousands for increases in performance that require writing custom code for every module.

SlaveToSwift · Feb 25, 2025

HJM.NL said:
I’m afraid after the M4 ultra the macpro gets neglected again for a couple of years if history is repeating. Apple isn’t really trying these last decade.

It’s so not trying that it has it’s highest pc market share since the Apple 2.

Darth Tulhu · Feb 25, 2025

"Where Does Mac Pro Go Next After M4 Ultra?"

The grave.

The way of the XServe.

mozumder · Feb 25, 2025

deconstruct60 said:
Digits' specs:

128 GB RAM. ( unifeid system RAM)

NVIDIA DGX Spark

A Grace Blackwell AI supercomputer on your desk.

www.nvidia.com

That is off your "minimal TB" purposed requirement by an order of magnitude.

Don't have to put the largest piece of data possible into memory to compose a better algorithm. So yeah they will get used but this "every single developer seat needs 2-4TB or RAM" to be productive is a lot of hand waving.

Developers will use the most powerful workstations available to them that they can afford.

MisterAndrew · Feb 25, 2025

It needs to offer higher performance and expansion than the Mac Studio to stay a viable product. It should have a higher-end SoC, two SoCs, and/or support MPX GPUs.

cocoua · Feb 25, 2025

Onelifenofear said:
I'd say that Apple will be making their own Discreet GPUs soon. for these machine ( well whatever they become in the future )

what would be the point of that beyond users's dreams? the only users out there to buy this GPUs would be the few Mac Pro users. The cost of making such GPU's would be quite elevated.

It is more efficient forcing people to buy all together. If you need more power next year, buy the next year model. (upgrading the SSD module for the 2023 MP is like 1000USD for a simple 2TB)

unless Apple decide that making it's own discrete GPUs is the most efficient way to compete in AI world...

theorist9 · Feb 25, 2025

Onelifenofear said:
I'd say that Apple will be making their own Discreet GPUs soon. for these machine ( well whatever they become in the future )

cocoua said:
what would be the point of that beyond users's dreams? the only users out there to buy this GPUs would be the few Mac Pro users. The cost of making such GPU's would be quite elevated.

It is more efficient forcing people to buy all together. If you need more power next year, buy the next year model. (upgrading the SSD module for the 2023 MP is like 1000USD for a simple 2TB)

unless Apple decide that making it's own discrete GPUs is the most efficient way to compete in AI world...

I don't believe Apple will be making discrete plug-in GPUs. But it's possible they may modularize Apple Silicon, such that you have separate GPU and CPU chips on the SoC, which gives higher-end users the option of ordering a GPU-heavy machine or a CPU-heavy machine. I still expect this will need to be configured at time of purchase.

deconstruct60 · Feb 25, 2025

phrehdd said:
Maybe a bit simplistic - rework the paradigm so that yes, the M4 or M5 can offload specific tasks to "modules" that are dedicated functions - items likely to see benefit - various video work, AI, 3D rendering and more. This would be what sets it apart from the Studio - ability to add modules or at least order them at purchase.

It is just as much a software thing as a hardware one. Afterburner card sits behind an Apple library service for handling decompressing ProRes. If the card is not present then the work is dispatched one way. If the card is present then it is routed through the Afterburner hardware. No major changes in the application that calls it.

A very similar thing is done with Apple Private Cloud Compute ( PCC).

" ... We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as Code Signing and sandboxing. ..."

Private Cloud Compute: A new frontier for AI privacy in the cloud - Apple Security Research

Secure and private AI processing in the cloud poses a formidable new challenge. To support advanced features of Apple Intelligence with larger foundation models, we created Private Cloud Compute (PCC), a groundbreaking cloud intelligence system designed specifically for private AI processing...

security.apple.com

If this module was on PCI-e card it wouldn't have to be limited to just the Mac Pro. ( Mac Studio with thunderbolt connection to a external PCI-e slot would just have slower connect. ). Apple PCC goes out over the internet, to Apple data center an back. Round trip time from a local PCI-e attached card probably should go faster.

The large catch here is that the 'functions' would largely be what Apple libraries already cover. ( e.g., decode ProRes with Apple code as opposed to some highly proprietary clone of that functionality. )

phrehdd said:
Any activity that will take more than a few moments should be offloaded.

There is a substantive limitation here. How much stuff is really tolerant of "a few moments". The longer the latency to/from the more pronounced the juggling act. Moving the ProRes decode on the same die allows the M2-M4 to far exceed what one Afterburner card can do. And at a much lower cost ( so price/performace) is radically different.

. If Apple put a M4 Max/Pro on a PCI-e card with this 'thin' PCC OS then all really need to add is a 'virtual Ethernet' over PCI-e so that can connect the pieces of software with the connection they already use to communicate. ( Apple has some security layers with PCC that is overkill on local to the same machine. )

Apple has some narrow API to distribute workload over multiple macs. This WWDC 2022 session used four Mac Studio's inter-networked via Thunderbolt connection ( using three ports on each can connect all of them with point-to-point connections).

https://developer.apple.com/videos/play/wwdc2022/10063/

Three Mn Max Mac-on-card in a Mac Pro can do all that point-to-point connections with no TB cables. ( may need some power to the card so perhaps some AUX power cables). Doesn't have to be on a module that only fits into a Mac Pro. If a full fledge Mac on a card there are folks with PC's that may want to plop one of these nodes into their system. ( a much broader scope).

phrehdd · Feb 25, 2025

deconstruct60 said:
It is just as much a software thing as a hardware one. Afterburner card sits behind an Apple library service for handling decompressing ProRes. If the card is not present then the work is dispatched one way. If the card is present then it is routed through the Afterburner hardware. No major changes in the application that calls it.

A very similar thing is done with Apple Private Cloud Compute ( PCC).

" ... We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as Code Signing and sandboxing. ..."

Private Cloud Compute: A new frontier for AI privacy in the cloud - Apple Security Research

Secure and private AI processing in the cloud poses a formidable new challenge. To support advanced features of Apple Intelligence with larger foundation models, we created Private Cloud Compute (PCC), a groundbreaking cloud intelligence system designed specifically for private AI processing...

security.apple.com

If this module was on PCI-e card it wouldn't have to be limited to just the Mac Pro. ( Mac Studio with thunderbolt connection to a external PCI-e slot would just have slower connect. ). Apple PCC goes out over the internet, to Apple data center an back. Round trip time from a local PCI-e attached card probably should go faster.

The large catch here is that the 'functions' would largely be what Apple libraries already cover. ( e.g., decode ProRes with Apple code as opposed to some highly proprietary clone of that functionality. )

There is a substantive limitation here. How much stuff is really tolerant of "a few moments". The longer the latency to/from the more pronounced the juggling act. Moving the ProRes decode on the same die allows the M2-M4 to far exceed what one Afterburner card can do. And at a much lower cost ( so price/performace) is radically different.

. If Apple put a M4 Max/Pro on a PCI-e card with this 'thin' PCC OS then all really need to add is a 'virtual Ethernet' over PCI-e so that can connect the pieces of software with the connection they already use to communicate. ( Apple has some security layers with PCC that is overkill on local to the same machine. )

Apple has some narrow API to distribute workload over multiple macs. This WWDC 2022 session used four Mac Studio's inter-networked via Thunderbolt connection ( using three ports on each can connect all of them with point-to-point connections).

https://developer.apple.com/videos/play/wwdc2022/10063/

Three Mn Max Mac-on-card in a Mac Pro can do all that point-to-point connections with no TB cables. ( may need some power to the card so perhaps some AUX power cables). Doesn't have to be on a module that only fits into a Mac Pro. If a full fledge Mac on a card there are folks with PC's that may want to plop one of these nodes into their system. ( a much broader scope).

I wasn't talking about TB connection, I was suggesting an internal module that perhaps worked with a daughter board or a smart backplane which is rather old school. Again, a different architecture would be needed but it would off load tasks known to take more than a few seconds to process all the while leaving the primary system to go on with other tasks. Arguably the difficulty beyond the change in hardware to make this happen would be how to offload to the additional hardware/modules.

phrehdd · Feb 25, 2025

SlaveToSwift said:
Yea, the days of custom built workstations are long gone for good reason. No one wants to pay tens of thousands for increases in performance that require writing custom code for every module.

When I spoke of modules, I was not talking about programming and yes, I should have worded that better. These would be hardware modules akin to PCI cards. The goal is to offload the work of processes that are known to take time and leave the main system with continued high availability for other tasks. Multi-media set ups would be an obvious choice.

developer13245 · Feb 27, 2025

I'm taking a good look at the new Framework Desktop product. Upgradeable, and will run multiple stable versions of Linux out of the box. Yes it has soldered RAM but the company said it's not interested in the "nickel and dime" strategy for upgrades. Hopefully it will be a success, and a breath of fresh air compared to what Apple has become.

SlaveToSwift · Feb 28, 2025

phrehdd said:
When I spoke of modules, I was not talking about programming and yes, I should have worded that better. These would be hardware modules akin to PCI cards. The goal is to offload the work of processes that are known to take time and leave the main system with continued high availability for other tasks. Multi-media set ups would be an obvious choice.

Custom hardware modules require custom programming, otherwise software won’t be able to take advantage of it.

phrehdd · Mar 1, 2025

SlaveToSwift said:
Custom hardware modules require custom programming, otherwise software won’t be able to take advantage of it.

And? Is Apple not capable of gathering up talent to write code for exploiting hardware they create? The point is that the modules are hardware akin to cards and such we see today. They are designed to take on dedicated work types to leave the system itself freed up and possibly speed up activities.

deconstruct60 · Mar 1, 2025

phrehdd said:
And? Is Apple not capable of gathering up talent to write code for exploiting hardware they create?

That premise requires that the 3rd party software vendors only use Apple APIs to implement their app. That is substantively narrowing. Your examples of "... AI, 3D rendering and more ..." are dominated by stuff that is outside the Apple libraries as the driving standards.

Apple Afterburner is great on ProRes. Red camera encodings ... not at all.

As long as 3rd party software developers have to adapt their software to Apple's propitiatory APIs it is more work to port to the Apple platform.

phrehdd said:
The point is that the modules are hardware akin to cards and such we see today. They are designed to take on dedicated work types to leave the system itself freed up and possibly speed up activities.

Again largely based on premise that the increased latencies you are trying to push here magically transcend the API interfaces with zero impact on real software design. For example, when games written for PCI-e v2-3 assumptions on CPU-GPU speeds are run on a PCI-e v4-5 connected GPU card there is substantially limited impact on performance. The work-around built for the assumptions are still dominate.

The reverse also tends to be problematical in several cases where software assumes uniform , integrated memory communication and the hardware presents something non-uniform and non homogeneous ( e.g., early Intel graphics drivers for their discrete GPU. Basically required PCI-e resizable bar to even begin to approach decent metrics).

Where the scope is relatively very narrow ( e.g. , TCP/IP overhead calculations) this can work. ( may Ethernet sub-function processors that off-load TCP/IP overhead). But those are not relatively broad categories like 'AI', 'rendering' , etc.

deconstruct60 · Mar 1, 2025

phrehdd said:
I wasn't talking about TB connection,

TB connection are just 'cheaper Ethernet'. If Apple provisioned 4 standard 10GbE ports on the Mac Studio , then those could have just as transparently been used. The 'much cheaper' impact is that using all point to point connections avoids having to buy an external switch.

phrehdd said:
I was suggesting an internal module that perhaps worked with a daughter board or a smart backplane which is rather old school. Again, a different architecture would be needed but it would off load tasks known to take more than a few seconds to process all the while leaving the primary system to go on with other tasks. Arguably the difficulty beyond the change in hardware to make this happen would be how to offload to the additional hardware/modules.

Again misses the point that there already is software frameworks to provision building scale out ( at 'a few seconds ' like latencies) into software that already use Ethernet connectivity. There is no 'new' paradigm involved and would be useful across the whole Mac ecosystem ; not sure some hyper low volume , proprietary stack.

As much as you want to hand wave this into a 100% hardware issue; it is not.

Bug-Creator · Mar 1, 2025

SlaveToSwift said:
Custom hardware modules require custom programming

Custom HW need economics of scale or it won't be cost effective for most use cases.
So unless those modules can be used elsewhere and in numbers that idea is toast even if they figure out SW support.

phrehdd · Mar 2, 2025

deconstruct60 said:
TB connection are just 'cheaper Ethernet'. If Apple provisioned 4 standard 10GbE ports on the Mac Studio , then those could have just as transparently been used. The 'much cheaper' impact is that using all point to point connections avoids having to buy an external switch.

Again misses the point that there already is software frameworks to provision building scale out ( at 'a few seconds ' like latencies) into software that already use Ethernet connectivity. There is no 'new' paradigm involved and would be useful across the whole Mac ecosystem ; not sure some hyper low volume , proprietary stack.

As much as you want to hand wave this into a 100% hardware issue; it is not.

Okay, you got a thing for software we get that. I was merely suggesting that since the M series, there has been no system using cards or similar, that the Pro be designed to have additional physical items that represent somewhat dedicated processes and coordinates with the M architecture. Yes, we know about software and instruction sets but that is not the point. The latter would be part of the bigger picture and not the picture.

tenthousandthings · Mar 2, 2025

phrehdd said:
Maybe a bit simplistic - rework the paradigm so that yes, the M4 or M5 can offload specific tasks to "modules" that are dedicated functions - items likely to see benefit - various video work, AI, 3D rendering and more. This would be what sets it apart from the Studio - ability to add modules or at least order them at purchase. Any activity that will take more than a few moments should be offloaded.

phrehdd said:
[…] I was merely suggesting that since the M series, there has been no system using cards or similar, that the Pro be designed to have additional physical items that represent somewhat dedicated processes and coordinates with the M architecture. Yes, we know about software and instruction sets but that is not the point. The latter would be part of the bigger picture and not the picture.

I think the Apple silicon architects and engineers would see this as a step backward, away from what they are working toward. If you implement something like the old MPX modules, you’re using an older paradigm. Reworking it doesn’t change that. Even using something like AMD’s Infinity Fabric interconnect is still that.

If Apple moves toward producing silicon optimized for particular workloads, it’s going to be via advanced packaging. Look at what Nvidia is doing with Grace Blackwell using CoWoS-L, with more to come.

Apple led the way with UltraFusion (using InFO-LSI, which is incorporated into CoWoS-L), and it’s unlikely they will just stand still. At some point they will move away from using the Max for the building blocks, thereby enabling a variety of combinations, though I doubt it will be for M5, more like M7 in 2027. But make no mistake, the big stride forward has already been taken.

phrehdd · Mar 3, 2025

tenthousandthings said:
I think the Apple silicon architects and engineers would see this as a step backward, away from what they are working toward. If you implement something like the old MPX modules, you’re using an older paradigm. Reworking it doesn’t change that. Even using something like AMD’s Infinity Fabric interconnect is still that.

If Apple moves toward producing silicon optimized for particular workloads, it’s going to be via advanced packaging. Look at what Nvidia is doing with Grace Blackwell using CoWoS-L, with more to come.

Apple led the way with UltraFusion (using InFO-LSI, which is incorporated into CoWoS-L), and it’s unlikely they will just stand still. At some point they will move away from using the Max for the building blocks, thereby enabling a variety of combinations, though I doubt it will be for M5, more like M7 in 2027. But make no mistake, the big stride forward has already been taken.

First, thanks for a thoughtful response. Second - integrated video and shared memory is not exactly new. The implementation with the M family might tbe specific but the concept is not new. Off loading also is an old idea but still remains with value in some circumstances. As the Pro often deals with certain types of users and their needs, the idea was to give them the power, also give them reasonable options to custom their work flow including dedicated heavy duty processes being off loaded other than using external ports to other devices.

tenthousandthings · Mar 3, 2025

phrehdd said:
First, thanks for a thoughtful response. Second - integrated video and shared memory is not exactly new. The implementation with the M family might tbe specific but the concept is not new. Off loading also is an old idea but still remains with value in some circumstances. As the Pro often deals with certain types of users and their needs, the idea was to give them the power, also give them reasonable options to custom their work flow including dedicated heavy duty processes being off loaded other than using external ports to other devices.

It’s not clear that you are familiar with the term “advanced packaging.” I could be wrong, but if not, then look it up.

It’s not about SoC, it’s about the next step beyond SoC, combining multiple chips (sometimes called “chiplets”) into what Nvidia calls “Superchips.” Apple has already succeeded (with M1-M2 Ultra) in the first part of this (fusing two chips using a local silicon interconnect), but they didn’t take the next step (combining two or more Ultras on a substrate). Rumor has it they tried, but the result didn’t meet their expectations, so they didn’t build it. Regardless, Nvidia (three years later) has now taken the next step with Grace Blackwell.

If you look at the photo of the GB200 Superchip at the above link, you can see five chips. Blackwells = Ultras, each contains two identical chips fused using a local silicon interconnect. The fifth chip, the Grace CPU, can stand in for whatever you like. You get the idea.

This is absolutely new, so much so that we don’t know much about it other than what TSMC has published, but we do know that Apple was the first to use InFO-LSI in a product, and Nvidia is the first to use CoWoS-L in a product.

SlaveToSwift · Mar 8, 2025

phrehdd said:
And? Is Apple not capable of gathering up talent to write code for exploiting hardware they create? The point is that the modules are hardware akin to cards and such we see today. They are designed to take on dedicated work types to leave the system itself freed up and possibly speed up activities.

Apple could invest a ton of R&D building a module connection system for modules that run faster than CPU on specific tasks that either require Apps like photoshop to be rewritten to take advantage, or work through an existing OS API that’s commonly used. In this scenario I the module connection system would significantly increase the cost of Macs that support it, and the modules would likely be super expensive and only benefit a tiny niche willing to pay that.

Or they could invest that R&D in the M4 Ultra or increasing interconnect performance, so all of the next Ultras are 30% faster at all multithreaded applications for the same cost as todays Ultra.

The benefit that reaches the most customers for the lowest cost is always the best solution for mainstream personal computers. We ain’t making custom workstations here.

phrehdd · Mar 8, 2025

SlaveToSwift said:
Apple could invest a ton of R&D building a module connection system for modules that run faster than CPU on specific tasks that either require Apps like photoshop to be rewritten to take advantage, or work through an existing OS API that’s commonly used. In this scenario I the module connection system would significantly increase the cost of Macs that support it, and the modules would likely be super expensive and only benefit a tiny niche willing to pay that.

Or they could invest that R&D in the M4 Ultra or increasing interconnect performance, so all of the next Ultras are 30% faster at all multithreaded applications for the same cost as todays Ultra.

The benefit that reaches the most customers for the lowest cost is always the best solution for mainstream personal computers. We ain’t making custom workstations here.

Mac Pro systems are not "mainstream personal computers" but fit an entirely different niche as even demonstrated by the earliest Mac Pro systems. Macs that could take cards remain far more flexible than the Macs we see today or those that can change out drives and adjust amount of RAM. A scenario where Macs are used in farms and such could benefit if certain tasks were off-loaded to modules/cards etc. leaving each machine within able to handle other tasks without challenge. Who would have thought years ago that a Mac could take a card that has an Intel chip on it and run Windows while the Mac itself was free to do other things - well that happened and allowed Macs in the entertainment field to do tasks more commonly associated (at that time) with Windows such as running accounting software. Apple's "trashcan" was a departure and while mediocre at best, drew a fair audience from those in the visual and audio fields.

Otwer22 · Mar 8, 2025

SlaveToSwift said:
It’s so not trying that it has it’s highest pc market share since the Apple 2.

Yea Apple has doesn’t anything, except for coming out with the most innovative CPU’s in the consumer computing market the world has ever seen, and five years later the rest of the industry hasn’t even come close to catching up.

leifp · Mar 10, 2025

Anyone else have really long lead times on “base” Mac Pros? For me, in Canada, it notes 10-17 April as delivery window… perhaps the Mac Studio just gets a release a month earlier… or Apple really is phasing out the Mac Pro…

Admiral · Sep 9, 2025

mozumder said:
Developers will use the most powerful workstations available to them that they can afford.

I spent the extra $1000 to get 128MB RAM in my Mac Studio M4 Max. I have found only a few tasks that really need the RAM, but when I have, it's been great to have. I can see the day coming when 64GB makes sense for the basic desktop.

I'll tell you what, though: Chrome has never handled six open tabs so well as it does for me now.

Where Does Mac Pro Go Next After M4 Ultra?

macrumors regular

macrumors regular

macrumors 68020

macrumors 68000

macrumors 68030

macrumors 65816

macrumors 601

macrumors G5

Contributor

Contributor

macrumors 6502a

macrumors regular

Contributor

macrumors G5

macrumors G5

macrumors 68000

Contributor

Contributor

Contributor

Contributor

macrumors regular

Contributor

macrumors regular

macrumors 6502a

macrumors 6502

Our Staff