Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

daftna

macrumors newbie
Original poster
There is as new wave of people setting up computers locally (both personal and business) for AI inference.

It is striking to me that right before this happens, Apple discontinues the Mac Pro which was perfectly suited for this. We go decades with the Mac Pro having sluggish sales and then the year that it will actually start selling like hotcakes, it's gone!

I guess Nvidia captures the market share with RTX Spark, that they announced a couple of days ago.

For the uninitiated, running a local LLM requires enough VRAM to fully fit it in memory. Models get huge very quickly. Because of the unified memory, this means you can fit much larger models on a mac instead of a traditional machine that has a separate video card (the video card has its own RAM and video cards with a lot of RAM are very, very expensive)
 
There is as new wave of people setting up computers locally (both personal and business) for AI inference.

It is striking to me that right before this happens, Apple discontinues the Mac Pro which was perfectly suited for this. We go decades with the Mac Pro having sluggish sales and then the year that it will actually start selling like hotcakes, it's gone!

I guess Nvidia captures the market share with RTX Spark, that they announced a couple of days ago.

For the uninitiated, running a local LLM requires enough VRAM to fully fit it in memory. Models get huge very quickly. Because of the unified memory, this means you can fit much larger models on a mac instead of a traditional machine that has a separate video card (the video card has its own RAM and video cards with a lot of RAM are very, very expensive)
I started to type a thread on the new Spark workstations, but fell asleep. Yeah, I'm right there with you. The 7,1 was all this back in 2019 if you had the pockets. Crazy to me how they axed it. A studio cannot be your halo product, Apple - it just can't - it's laughable.

And now Nvidia + Windows covering laptops, desktops, and actual workstations... they're going to continue eating everyone's lunch. AMD better keep introducing new features to legacy hardware.
 
I started to type a thread on the new Spark workstations, but fell asleep. Yeah, I'm right there with you. The 7,1 was all this back in 2019 if you had the pockets. Crazy to me how they axed it. A studio cannot be your halo product, Apple - it just can't - it's laughable.

And now Nvidia + Windows covering laptops, desktops, and actual workstations... they're going to continue eating everyone's lunch. AMD better keep introducing new features to legacy hardware.
Very perplexing. I noticed some price spikes in the used market for the higher-RAM mac studios as well. I would have expected Apple to have released some kind of server rack as well by now.
 
There's a big enough population of us that felt the pain when Xserve, the RAID, server app, Time Machine Capsule and all the networking products were discontinued. If they were to introduce a quiet 1U server with unlimited everything, it would sell like hotcakes.
 
Last edited:
Reduce your expectations to what can be done on-device in an iPad, and you'll always have a realistic pessimism. That's the future of the Mac; an iPad with bigger / more screens and input devices to run single-window iPad apps.

The AI models that will run on-device will be compacted versions of things trained off device, and app developers will need to sign up to an extra level of royalty with Apple to access "Apple Models". For "security reasons", independently trained models are going to go away.
 
There's a big enough population of us that felt the pain when Xserve, the RAID, server app, Time Machine and all the networking products were discontinued. If they were to introduce a quiet 1U server with unlimited everything, it would sell like hotcakes.

Yup, a turnkey APFS / iSCSI NAS, that can be an iCloud cache / alternative (Remember Transporter Sync?), but you know, subscriptions and services are Timmy's legacy.
 
It is striking to me that right before this happens, Apple discontinues the Mac Pro which was perfectly suited for this.
...but the M3 Ultra Mac Studio is also perfectly suited to local LLM work, and costs $2000 less (which also makes it viable for clustering). All the Mac Pro offered over the Studio was PCIe slots, which are irrelevant to Unified RAM-based LLM use.


Apple's stated reason for the current shortage & (probably temporary) discontinuation of some higher-RAM Studio was unexpected demand due to LLM use (probably due to M4 and M3 Ultra Studio stocks drying up before the M5 models were due to launch). That shortage is a problem, but a new M3 Ultra Mac Pro would only have made that problem worse.
 
@theluggage that was a great video. Thunderbolt has a lot of untapped potential, and so does this studio clustering. But it's no substitute for a real Mac Pro.

This cluster isn't running anything that we couldn't in 2019, and it still costs $40k. Nobody wants this on their desk when Mac Pro is possible and exists. The cables aren't secure. There's an extra layer of management with the cluster vs. a monolithic box. There's paltry FP64 performance and no way to add more. I'd also like to see how gracefully it handles a full compute load while gaming at 4k. My old box does it fine.

The clustering is a great feature to have, just like target disk mode. But for today, I would hardly say this solution is perfectly suited. Glad it's an option folks can rig up though - it'll get better I hope. Some part of me clings to hope that they actually have Xgrid 2.0 ready to drop on us. 🤣 - I only got to set that up a few times back in the day. For CRISPR I think it was...
 
Last edited:
Reduce your expectations to what can be done on-device in an iPad, and you'll always have a realistic pessimism. That's the future of the Mac; an iPad with bigger / more screens and input devices to run single-window iPad apps.

The AI models that will run on-device will be compacted versions of things trained off device, and app developers will need to sign up to an extra level of royalty with Apple to access "Apple Models". For "security reasons", independently trained models are going to go away.
Damn, that's a depressing outlook. Gonna install that USB LED and test today.
 
Thunderbolt has a lot of untapped potential, and so does this studio clustering. But it's no substitute for a real Mac Pro.
As the video shows - even a single Mac Studio is a decent machine for local LLM because of the advantage of having a relatively large amount of Unified RAM directly shared between CPU/GPU/NPU vs. limited VRAM (which has to be loaded via main RAM) on a dGPU card.

Nobody wants this on their desk when Mac Pro is possible and exists.
An Apple Silicon Mac Pro with a competitive level of bandwidth+lanes for PCIe based GPUs or NPUs isn't possible (without Apple designing a new SoC die just for the Mac Pro) and doesn't exist... and, even if it did, would be reliant on the same NVIDIA or AMD dGPUs that can be used in any generic Xeon/Ryzen box and lack the unified RAM advantage which lets a Mx processor with integrated GPU punch above its weight.

The Mac Pro may look like it has plenty of PCIe slots, but the M2/M3 Ultra only has 32 lanes of PCIe4 of which only 22 are available for the PCIe slots (with various constraints on how they can be allocated to slots). Current Xeons and Threadrippers have 128+ lanes of PCIe5. AFAIK many LLM tools run in Linux just as well as MacOS/Unix, and CPU power consumption is irrelevant on a personal workstation stuffed with NVIDIA space heaters, so if you want a GPU-based LLM platform a Xeon or Threadripper is probably the tool for the job.

Want a Mac Pro cluster? - you'd have to use Thunderbolt, same as the Studio.
 
...but the M3 Ultra Mac Studio is also perfectly suited to local LLM work, and costs $2000 less (which also makes it viable for clustering). All the Mac Pro offered over the Studio was PCIe slots, which are irrelevant to Unified RAM-based LLM use.


I am not quite sure that last statement reconciles with the video you posted after it. Lot of potential benefits of unified memory (something SGI pioneered 30 years ago) but once you get to clusters you are outside of unified memory.

Apple's TB5 and RDMA support is great to see. However, TB5 is limited as an interconnect at 80Gbps as well as scaling. In the case of the Studio, there's a limit of 4-5 direct connections and no option for switches/etc. A system with PCIe slots would provide the option to access standard interconnects (e.g. Infiniband) that allow switches, more ports per system, and/or higher bandwidth per port.

These are of course higher-end solutions than the Studio's segment, which is very cost-effective in it's range. Studio clusters using TB5/RDMA are also an easy way to get at 2-4 system solutions. But that's where they top out and not sure how they would compare with the best that could be put together for ~ $40K using PC clusters (where one could put multiple GPU in the same box and/or cluster more than 4 boxes with higher-end interconnects).

Then there's the elephant in the room that one can't even buy the systems featured in that video anymore. For at least the next few months, 96GB is most RAM you can order in a Studio and whatever you order is what you'll have for the life of the computer. Market prices for RAM are insane right now but even those with the much maligned 7 year old Mac Pro can stuff 1.5TB of RAM and/or 4xAMD GPU in them if economically justified for their work. There simply isn't an option for anything close with Mac right now.
 
There's a big enough population of us that felt the pain when Xserve, the RAID, server app, Time Machine and all the networking products were discontinued.
...you mean all the things that can now be done on a generic Linux box - or even a dedicated plug-and-play NAS box - running Samba and ZFS, for a fraction of the price of an XServe?

Times change. When the XServe launched, it was competing against Windows Server, Netware and commercial Unix with sky-high (often "per seat") licensing - and Macs were reliant on various proprietary Apple protocols. The market was killed by Linux, turnkey NAS products with friendly web-based config screens and a move towards open networking standards (including the opening-up of SMB/CIFS).

Apple tend to get out of a market when they no longer have a unique product they can charge a premium for.
 
I too was surprised and disappointed that the MP was discontinued. It seems that Apple considers it cost ineffective to provide a more-powerful-than-Studio jack of all trades computer. The Studio is hella powerful, solidly meeting the needs of most desktop users.

The high end is currently dominated by AI needs, and AI is very much nascent. If Ai demands were to solidify on local distributed processing as an ongoing major methodology choice, and if Studio tech did not remove enough heat for Apple to compete, Apple could introduce some future high performing computer specifically designed for the computing demands of that future time; with or without the MP nomenclature.
 
Last edited:
  • Like
Reactions: the future
Lot of potential benefits of unified memory (something SGI pioneered 30 years ago) but once you get to clusters you are outside of unified memory.

...but you'd have the same problem with multiple GPUs - the VRAM would be local to each GPU, and smaller than on an Apple Silicon node.

The video also pointed out that even a single Mac Studio was quite effective.

A system with PCIe slots would provide the option to access standard interconnects (e.g. Infiniband) that allow switches, more ports per system, and/or higher bandwidth per port.
...except current Apple Silicon SoCs have very limited PCIe bandwidth to support such cards.

A custom Apple Silicon die, exclusively for PCIe-based systems, would be needed to improve that which would make the result insanely expensive without a mass consumer market.

Then there's the elephant in the room that one can't even buy the systems featured in that video anymore. For at least the next few months, 96GB is most RAM you can order in a Studio and whatever you order is what you'll have for the life of the computer.
Sure, but thats down to limited production capacity for processors and maybe RAM shortages (both thanks to the AI bubble) which would also affect any hypothetical new Mac Pro (see thread title) - and which is affecting the entire industry. Apple have even blamed unexpected demand for high-RAM Studios and Minis because people want to use them for AI.
 
It seems that Apple considers it cost ineffective to provide a more-powerful-than-Studio jack of all trades computer.
Except the 2023 was effectively just a Mac Studio with some last-generation PCIe slots (...and barely enough PCIe lanes to fully exploit them) which couldn't be used for GPUs.

What is ineffective is trying to get Apple Silicon to compete with Xeon/Threadripper/Epyc when it comes to driving multiple NVIDIA/AMD dGPUs in a big box of slots.
 
...but you'd have the same problem with multiple GPUs - the VRAM would be local to each GPU, and smaller than on an Apple Silicon node.

Agree but my point is in the other direction: unified memory has a lot of advantage but only in the single system segment. Once you go cluster, it's not an advantage. And then whatever tradeoffs that made sense at the low-end hinder the high-end without compensating upside.

The video also pointed out that even a single Mac Studio was quite effective.

Yes agree as I mentioned it is very cost effective in the single system segment.

...except current Apple Silicon SoCs have very limited PCIe bandwidth to support such cards.

A custom Apple Silicon die, exclusively for PCIe-based systems, would be needed to improve that which would make the result insanely expensive without a mass consumer market.

Agree current Apple Silicon is definitely designed around different assumptions and trade-offs but I think that is the point. Apple went down a path that made sense at the low-end but is now a liability at the high-end. Which may or may not be a strategic liability in the long-run.

Sure, but thats down to limited production capacity for processors and maybe RAM shortages (both thanks to the AI bubble) which would also affect any hypothetical new Mac Pro (see thread title) - and which is affecting the entire industry.

Except with slotted RAM, I can a) buy a computer now with what's available and expand it post-bubble without throwing the whole computer away or b) go 3rd party.

Apple have even blamed unexpected demand for high-RAM Studios and Minis because people want to use them for AI.

Yes they had great potential for AI but now their design decisions have become a liability while Apple is the bottleneck to running large AI models on their computers due to their inherent sole-source design. If one could put 3rd-party RAM in a system then one can decide if it's worth it. Now buyers have two options: 96GB of unified RAM or switch platforms.
 
Reduce your expectations to what can be done on-device in an iPad, and you'll always have a realistic pessimism. That's the future of the Mac; an iPad with bigger / more screens and input devices to run single-window iPad apps.

The AI models that will run on-device will be compacted versions of things trained off device, and app developers will need to sign up to an extra level of royalty with Apple to access "Apple Models". For "security reasons", independently trained models are going to go away.
This seems completely unrealistic with all that apple's been saying and doing the last decades. The iPad is closed so the Mac can stay open.
 
@theluggage that was a great video. Thunderbolt has a lot of untapped potential, and so does this studio clustering. But it's no substitute for a real Mac Pro.

This cluster isn't running anything that we couldn't in 2019, and it still costs $40k. Nobody wants this on their desk when Mac Pro is possible and exists. The cables aren't secure. There's an extra layer of management with the cluster vs. a monolithic box. There's paltry FP64 performance and no way to add more. I'd also like to see how gracefully it handles a full compute load while gaming at 4k. My old box does it fine.

The clustering is a great feature to have, just like target disk mode. But for today, I would hardly say this solution is perfectly suited. Glad it's an option folks can rig up though - it'll get better I hope. Some part of me clings to hope that they actually have Xgrid 2.0 ready to drop on us. 🤣 - I only got to set that up a few times back in the day. For CRISPR I think it was...
How do you define "full compute load" if you can game at 4K just fine while it's running? Seems like a not-so-full load then? Also, I'm quite sure that cluster can run things we couldn't in 2019, because stuffing a box with the same amount of Video RAM back then was quite more expensive or even simply impossible (the maximum officially supported VRAM by apple in the 2019 Mac Pro was 128GB of VRAM divided over the GPU's); the M3 ultra provides up to 512GB of unified memory. Meaning you can run models that the 2019 simply couldn't.
 
Apple went down a path that made sense at the low-end but is now a liability at the high-end. [....] Which may or may not be a strategic liability in the long-run.
Apple abandoned the "high-end" in 2012/2013 when they let the original Mac Pro wither on the vine and replaced it with the Trashcan (...which, I'm pretty sure, they then planned to replace with the iMac Pro in 2017).

Every Mac Pro since 2012 has been a "one and done" - the Trashcan, the iMac Pro, the 2019 MP (which priced out many potential buyers), the 2023 MP (same case - totally different product). I don't think Apple are particularly "strategic" about the high end).

The Mac Studio is clearly where Apple wanted to go with the Trashcan - and a much more credible product - and has been updated twice which hasn't happened to a Mac Pro since about 2012....

Meanwhile, Apple Silicon has proven to be a major strategic advantage at the low/mid-range all the way up to the Studio. I think the "high end" is something they're happy to sacrifice - they don't need to have a foot in every pie.
 
Except with slotted RAM, I can a) buy a computer now with what's available and expand it post-bubble without throwing the whole computer away or b) go 3rd party.
Except the same supply problems have hit SSDs and GPUs (at least the sort you'd want for AI work) and you'd still need enough RAM to get the job done faster than whatever you have at the moment, or where's the point?

Reality is its a bad time to upgrade to a new computer full stop.
...and in the Apple world, even if the M4 or M3 Ultra Studios available today did have slotted RAM, they'd still be last year's M4 and M3 tech.
 
I agree with the idea, but Apple needed to do more than just keep a refreshed/reheated “updated” version of the M2 Pro. I think they should have and could have done more with the platform to capture this part of the market.

Imagine a Pro with adequate PCIE lanes to run blade-style M-Ultras in each PCI slot.

Or go back further in time and resurrect the idea of the PCI RAM disk, allowing the pro to utilize terabytes high-speed swap for model and context buffering.

That’s the type of hardware that would’ve differentiated the Pro platform from the Studio platform.
 
The Mac Pro is not the most suitable machine for AI. As others said, the Studio (especially clustered) does well for this. The Mac Pro was never a well-positioned machine in the first place. The first generation of it (replacing the Power Mac G5) was somewhat decent. Then it morphed into a reincarnation of the G4 Cube. And then the real "fun" design decisions happened.

The Post-Trashcan Intel Mac Pro was nothing more than a low-end Kenworth in a market that was asking for a Ford F150. And then with the M-Series it became a Ford Focus with a utility trailer attached to the back. Apple mishandled the "workstation" market starting in 2013 and never really got it right afterward. The Mac Pro has not been "the most suitable" machine for anything in at least ten years.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.