Apple Silicon as server chips speculation

senttoschool · May 9, 2024

I've long theorized Apple could justify the R&D cost of an "Extreme" SoC by also deploying it in the server. In 2024, the world has changed a lot. LLMs and GenAI have taken over and are starving for compute. Bloomberg is reporting that Apple plans to use their Apple Silicon in the cloud to do AI inference.

Apple’s plan to use its own chips and process AI tasks in the cloud was hatched about three years ago, but the company accelerated the timeline after the AI craze — fueled by OpenAI’s ChatGPT and Google’s Gemini — forced it to move more quickly.

Bloomberg is reporting the usage of M2/M4 Ultra thus far.

Here are my thoughts:

Apple needs to use the M2/M4 Ultra because their dedicated AI inference chips aren't ready.
Using a full Ultra/Extreme chip to do inference can't be scalable. Apple has billions of users. But because of how fast LLMs/GenAI are developing, Apple has no choice but to use what they have now.
Apple is probably planning to break its Neural Engine into its own chip and scale it for server use long term. All big tech have their own in-house server NPUs deployed. Amazon has Inferentia. Microsoft just announced Maia 100. Google has had TPU for many years.
There might still be use cases where Apple needs the full SoC to emulate a local Mac for some sort of cloud service. I think Apple might customize these SoCs for servers as well. For example, fewer CPU cores, more GPU cores, no display controllers.

senttoschool · May 9, 2024

Funny enough, Bloomberg said Apple started planning this 3 years ago. I wrote my post about how Apple will need to deploy Apple Silicon in the cloud to justify an "Extreme" SoC 3 years ago as well.

Here's what I wrote:

You're Tim Cook, sitting in his nice office, looking at how much money you just spent to make this giant SoC for a relatively small market. In fact, you have to do this every year or every two years to keep the Mac Pro relevant. How do you recuperate some of this money spent?

Maybe Tim Cook was really sitting in his office, being disgusted at the R&D budget for the Extreme chip and said, why don't we use it in the server as well? 😂

My original hypothesis was that Apple could rent out their high powered SoCs via the cloud. For example, you're sitting in a coffee shop on an M4 Air, with one 3 finger swipe on the touchpad, your desktop moves to an M4 Extreme SoC running in the cloud. Or certain local applications will get cloud acceleration such as xCode or FCP with 3rd party developer access as well. Obviously, I could not predict the LLM/GenAI craze.

Chuckeee · May 9, 2024

senttoschool said:
Bloomberg is reporting

senttoschool said:
Bloomberg said

You lost me as soon as you sited Bloomberg as your source 😁

senttoschool · May 9, 2024

Chuckeee said:
You lost me as soon as you sited Bloomberg as your source 😁

Well, it was actually Jeff Pu who reported this first. That said, I've always found Mark Gurman reputable.

It doesn't take a genius to predict that Apple is planning some sort of in-house AI inference chip. Amazon has Inferentia. Microsoft just announced Maia 100. Google has had TPU for many years. Meta has MITIA NPUs. Apple has...?

leman · May 10, 2024

I don't think that M2 Ultra makes much sense as an ML inference server. Nether performance or scalability are there. I have similar feelings about M4 Ultra, unless the GPU comes with some ML-focused changes we are not aware of yet.

A custom chip that is all CPU clusters with increased amount of AMX resources? Now that would be formidable indeed.

altaic · May 10, 2024

The purpose of using M2 Ultras or whatever is to tune and validate models that run on the various devices. Small mathematical errors (due to floating point implementations, or otherwise indeterminacy) compound, so you have to make sure a given model works properly with a given environment. Apple isn’t using M2 Ultras in lieu of GH200s or what have you.

MRMSFC · May 10, 2024

My personal hope is that the consumer sees this chip as well and it isn’t just relegated to iCloud servers.

I’m curious to see what Apple would do with potentially mammoth cpu, npu, and gpu clusters not constrained by being on an SoC

oneMadRssn · May 10, 2024

This would be a very smart move honestly. nVidia right now has a pseudo monopoly on AI/ML optimized cores for servers. The nVidia GPUs are the most expensive part of a datacenter right now, by far. Nobody else offers an off-the-shelf processing core and API as comprehensive and powerful. If Apple is planning to roll out some AI/ML features that will require server-side processing (which is likely), they would be very smart to leverage their in-house chips and in-house APIs to save money and differentiate themselves from the competition that has no choice but to spend on nVidia.

As a partially irrelevant aside, I miss the days of Apple offering a server OS. I wish I could spin-up my own personal iCloud server for syncing photos, files, and performing iOS backups. It's not about the cost of iCloud, it's about ownership and control for me.

leman · May 10, 2024

Now that Apple has SME, it is possible that they will offer a compelling way forward for ML on the CPU. It all depends on the performance. The new AMX units would need to be very fast to even come close to Nvidia.

Boil · May 10, 2024

Apple ASi AI Server

Rackmount Chassis
M4 Ultra SoC
ASi GPU/NPU Blades
???

diamond.g · May 13, 2024

Boil said:
Apple ASi AI Server

Rackmount Chassis

M4 Ultra SoC

ASi GPU/NPU Blades

???

Wouldn't that break the unified memory model Apple has going for them?

Boil · May 13, 2024

diamond.g said:
Wouldn't that break the unified memory model Apple has going for them?

It's okay, the Apple ASi AI Server is not an end-user product, it is a device which runs the Apple iCloud AI service...

diamond.g · May 13, 2024

Boil said:
It's okay, the Apple ASi AI Server is not an end-user product, it is a device which runs the Apple iCloud AI service...

Even so, the models that it is running would be hurt by the bifrucation of system ram and acclerator(GPU) ram right? Seems like that hurts nvidia (from what I am reading of folks comparing large models on M1 vs 4090). Not sure why it wouldn't hurt Apple as well.

Boil · May 13, 2024

diamond.g said:
Even so, the models that it is running would be hurt by the bifrucation of system ram and acclerator(GPU) ram right? Seems like that hurts nvidia (from what I am reading of folks comparing large models on M1 vs 4090). Not sure why it wouldn't hurt Apple as well.

Dunno, not a systems engineer or anything, just a dude on the internet that likes to speculate on Apple stuff... ;^p

Search

Search

Apple Silicon as server chips speculation

senttoschool

macrumors 68030

senttoschool

macrumors 68030

Chuckeee

macrumors 68040

senttoschool

macrumors 68030

leman

macrumors Core

altaic

macrumors 6502a

MRMSFC

macrumors 6502

oneMadRssn

macrumors 603

leman

macrumors Core

Boil

macrumors 68040

diamond.g

macrumors G4

Boil

macrumors 68040

diamond.g

macrumors G4

Boil

macrumors 68040

Our Staff