Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,573
5,338
I've long theorized Apple could justify the R&D cost of an "Extreme" SoC by also deploying it in the server. In 2024, the world has changed a lot. LLMs and GenAI have taken over and are starving for compute. Bloomberg is reporting that Apple plans to use their Apple Silicon in the cloud to do AI inference.

Apple’s plan to use its own chips and process AI tasks in the cloud was hatched about three years ago, but the company accelerated the timeline after the AI craze — fueled by OpenAI’s ChatGPT and Google’s Gemini — forced it to move more quickly.

Bloomberg is reporting the usage of M2/M4 Ultra thus far.

Here are my thoughts:
  • Apple needs to use the M2/M4 Ultra because their dedicated AI inference chips aren't ready.
  • Using a full Ultra/Extreme chip to do inference can't be scalable. Apple has billions of users. But because of how fast LLMs/GenAI are developing, Apple has no choice but to use what they have now.
  • Apple is probably planning to break its Neural Engine into its own chip and scale it for server use long term. All big tech have their own in-house server NPUs deployed. Amazon has Inferentia. Microsoft just announced Maia 100. Google has had TPU for many years.
  • There might still be use cases where Apple needs the full SoC to emulate a local Mac for some sort of cloud service. I think Apple might customize these SoCs for servers as well. For example, fewer CPU cores, more GPU cores, no display controllers.
 
Last edited:

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,573
5,338
Funny enough, Bloomberg said Apple started planning this 3 years ago. I wrote my post about how Apple will need to deploy Apple Silicon in the cloud to justify an "Extreme" SoC 3 years ago as well.

Here's what I wrote:
You're Tim Cook, sitting in his nice office, looking at how much money you just spent to make this giant SoC for a relatively small market. In fact, you have to do this every year or every two years to keep the Mac Pro relevant. How do you recuperate some of this money spent?
Maybe Tim Cook was really sitting in his office, being disgusted at the R&D budget for the Extreme chip and said, why don't we use it in the server as well? 😂

My original hypothesis was that Apple could rent out their high powered SoCs via the cloud. For example, you're sitting in a coffee shop on an M4 Air, with one 3 finger swipe on the touchpad, your desktop moves to an M4 Extreme SoC running in the cloud. Or certain local applications will get cloud acceleration such as xCode or FCP with 3rd party developer access as well. Obviously, I could not predict the LLM/GenAI craze.
 
Last edited:

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,573
5,338
You lost me as soon as you sited Bloomberg as your source 😁
Well, it was actually Jeff Pu who reported this first. That said, I've always found Mark Gurman reputable.

It doesn't take a genius to predict that Apple is planning some sort of in-house AI inference chip. Amazon has Inferentia. Microsoft just announced Maia 100. Google has had TPU for many years. Meta has MITIA NPUs. Apple has...?
 

leman

macrumors Core
Oct 14, 2008
19,316
19,329
I don't think that M2 Ultra makes much sense as an ML inference server. Nether performance or scalability are there. I have similar feelings about M4 Ultra, unless the GPU comes with some ML-focused changes we are not aware of yet.

A custom chip that is all CPU clusters with increased amount of AMX resources? Now that would be formidable indeed.
 

altaic

macrumors 6502a
Jan 26, 2004
654
432
The purpose of using M2 Ultras or whatever is to tune and validate models that run on the various devices. Small mathematical errors (due to floating point implementations, or otherwise indeterminacy) compound, so you have to make sure a given model works properly with a given environment. Apple isn’t using M2 Ultras in lieu of GH200s or what have you.
 
Last edited:

MRMSFC

macrumors 6502
Jul 6, 2023
342
352
My personal hope is that the consumer sees this chip as well and it isn’t just relegated to iCloud servers.

I’m curious to see what Apple would do with potentially mammoth cpu, npu, and gpu clusters not constrained by being on an SoC
 

oneMadRssn

macrumors 603
Sep 8, 2011
5,999
14,065
This would be a very smart move honestly. nVidia right now has a pseudo monopoly on AI/ML optimized cores for servers. The nVidia GPUs are the most expensive part of a datacenter right now, by far. Nobody else offers an off-the-shelf processing core and API as comprehensive and powerful. If Apple is planning to roll out some AI/ML features that will require server-side processing (which is likely), they would be very smart to leverage their in-house chips and in-house APIs to save money and differentiate themselves from the competition that has no choice but to spend on nVidia.

As a partially irrelevant aside, I miss the days of Apple offering a server OS. I wish I could spin-up my own personal iCloud server for syncing photos, files, and performing iOS backups. It's not about the cost of iCloud, it's about ownership and control for me.
 
  • Like
Reactions: TarkinDale

leman

macrumors Core
Oct 14, 2008
19,316
19,329
Now that Apple has SME, it is possible that they will offer a compelling way forward for ML on the CPU. It all depends on the performance. The new AMX units would need to be very fast to even come close to Nvidia.
 

diamond.g

macrumors G4
Mar 20, 2007
11,169
2,482
OBX
It's okay, the Apple ASi AI Server is not an end-user product, it is a device which runs the Apple iCloud AI service...
Even so, the models that it is running would be hurt by the bifrucation of system ram and acclerator(GPU) ram right? Seems like that hurts nvidia (from what I am reading of folks comparing large models on M1 vs 4090). Not sure why it wouldn't hurt Apple as well.
 
  • Like
Reactions: bcortens

Boil

macrumors 68040
Oct 23, 2018
3,289
2,911
Stargate Command
Even so, the models that it is running would be hurt by the bifrucation of system ram and acclerator(GPU) ram right? Seems like that hurts nvidia (from what I am reading of folks comparing large models on M1 vs 4090). Not sure why it wouldn't hurt Apple as well.

Dunno, not a systems engineer or anything, just a dude on the internet that likes to speculate on Apple stuff... ;^p
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.