Would Apple's custom AI server chip use HBM or stick with LPDDR?

senttoschool · Feb 5, 2025

Apple is designing their own AI chip.

At the server level, power requirements don't matter nearly as much as laptops and phones. You can use more power hungry RAM. All custom hyperscaler AI chip designs from Google, AWS, Meta, and Microsoft use HBM. Inferencing large LLM models require very high bandwidth - especially for billions of Apple devices.

If it's forgone conclusion that Apple's own custom AI server chip will use HBM, then what are the odds that Apple will re-use the design for a Mac Pro version and sell an amazing local LLM machine in a growing market?

crazy dave · Feb 5, 2025

My guess, and it is a guess, would be LPDDR actually. Apple get remarkably good bandwidth out of LPDDR, server power is actually still important (although if I remember right HBM power cost is no where near as bad as GDDR per bandwidth), and LPDDR is less expensive both because Apple is already getting in bulk for their other chips and just intrinsically. LPDDR also has much better latency and Apple likes to use the memory for both CPU and GPU.

For Nvidia's part when building their Grace-Hopper/Blackwell super chips have unified memory too ... BUT ... the side of the chip with the GPU(s) has HBM attached and the side with the CPU has LPDDR. While again it is unified memory, Nvidia probably does quite a bit of NUMA-like shuffling around of data to locate the data closest to the function block currently using it the most.

throAU · Feb 5, 2025

senttoschool said:
Apple is designing their own AI chip.

At the server level, power requirements don't matter nearly as much as laptops and phones. You can use more power hungry RAM. All custom hyperscaler AI chip designs from Google, AWS, Meta, and Microsoft use HBM. Inferencing large LLM models require very high bandwidth - especially for billions of Apple devices.

If it's forgone conclusion that Apple's own custom AI server chip will use HBM, then what are the odds that Apple will re-use the design for a Mac Pro version and sell an amazing local LLM machine in a growing market?

Servers do need power efficiency!

There’s only so much cooling and power you can fit in a physical rack and you want to fit as much hardware in there as you can because datacenter rack space is expensive!

Hbm actually saves power, it doesn’t increase power consumption!

senttoschool · Feb 5, 2025

throAU said:
Servers do need power efficiency!

Absolutely. AI revolution is now power constrained but HBM power requirements pales in comparison to the actual GPU.

HBM uses ~15% of the total power consumption between the GPU, controllers, networking link, etc. Wouldn't make too much sense to not use HBM.

senttoschool · Feb 5, 2025

crazy dave said:
Apple get remarkably good bandwidth out of LPDDR,

Do you have any examples of deployed server AI chips that use LPDDR? I can't find any.

Also, how much bandwidth do you think Apple can get out of LPDDR in the server and is that enough for a cutting edge model?

There's a reason why every single one either use HBM or use massive SRAM to store the entire model (Groq/Cerebras).

crazy dave · Feb 5, 2025

senttoschool said:
Do you have any examples of deployed server AI chips that use LPDDR? I can't find any.

Also, how much bandwidth do you think Apple can get out of LPDDR in the server and is that enough for a cutting edge model?

There's a reason why every single one either use HBM or use massive SRAM to store the entire model (Groq/Cerebras).

I already gave one ... Nvidia Superchips on the CPU side. It depends on what Apple actually wants out of these server chips. Right now they're using M2 Ultras - to some extent out of necessity if they wanted to use their own hardware but they could've bought someone else's solution which would obviously be more fitting if they prioritized that bandwidth over everything else. So there's a lot of assumptions builtin to your post about what Apple envisions - these may be as much server chips as they are AI chips.

I'm not saying they will absolutely use LPDDR, but if they are going for a hybrid AI server/consumer chip, that could make sense. Vast amounts of LPPDR can still get very high bandwidth - a theoretical M4 Ultra would get >1 TB/s of low power bandwidth while a theoretical M4 Extreme would get >2 TB/s. Again it depends on what Apple is actually building these for and how they'll be used.

senttoschool · Feb 6, 2025

crazy dave said:
Nvidia Superchips on the CPU side.

These aren't AI chips. They're Neoverse Grace CPUs designed to work with Nvidia GPUs.

To be perfectly clear, when I say "AI chips", I meant some sort of NPU/TPU/GPU type architecture and not a CPU.

senttoschool · Feb 6, 2025

crazy dave said:
Vast amounts of LPPDR can still get very high bandwidth - a theoretical M4 Ultra would get >1 TB/s of low power bandwidth while a theoretical M4 Extreme would get >2 TB/s.

Sure, this is obvious. Double from M4 Max.

Modern HBM solutions for cutting edge AI have 8TB/s.

Llama 405b on a theoretical M4 Extreme running in INT8 format would get 2.47 tokens/s. That's pitifully slow.

Add in the fact that a new scaling law in inference time compute is emerging (models think longer are smarter), it'd be painful to use LPDDR unless Apple does not want to use a cutting edge model.

leman · Feb 6, 2025

Bandwidth and compute are both parts of the same equation. If Apple creates new hardware that can deliver more compute, they will also need more bandwidth. Until now, they don't have this technology, so LPDDR works perfectly fine for them.

mi7chy · Feb 6, 2025

If it's not industry cookie cutter HBM or GDDR then it's on-chip memory like Cerebras.

crazy dave · Feb 6, 2025

senttoschool said:
These aren't AI chips. They're Neoverse Grace CPUs designed to work with Nvidia GPUs.

To be perfectly clear, when I say "AI chips", I meant some sort of NPU/TPU/GPU type architecture and not a CPU.

You're making a distinction where there isn't one. The Grace-Hopper Superchip is basically an asymmetric Ultra (a design which Apple is rumored to be moving towards with the M5). It's a single chip. The lpDDR memory may service the CPU side of the super chip but they are part of the unified memory system of the chip and the chip itself is an AI server chip, which is much more similar to what Apple is rumored to be building than solely an AI training/inference device. In this, the lpDDR provides "the bulk" as it were while the HBM provides the bandwidth. Nvidia uses both for its unified solution - if you'd like you can even think of it like the HBM being a cache for the lpDDR when doing AI tasks on the GPU and the lpDDR is also necessary for the proper performance of the CPU when doing server tasks. The 72 ARM cores and the promises of heterogenous compute are not just for show.

senttoschool said:
Sure, this is obvious. Double from M4 Max.

Modern HBM solutions for cutting edge AI have 8TB/s.

Llama 405b on a theoretical M4 Extreme running in INT8 format would get 2.47 tokens/s. That's pitifully slow.

My impression is that, for classical GPU systems, they link them together to run Llama 405b anyway. Cerebras obviously is different as that's a wafer scale solution in and of itself.

senttoschool said:
Add in the fact that a new scaling law in inference time compute is emerging (models think longer are smarter), it'd be painful to use LPDDR unless Apple does not want to use a cutting edge model.

That last part is the crux, what is Apple using this for and what does Apple actually see its use case being? If this is going to focus as much on server tasks as AI tasks as well as being a dual-use chip design as has been put forwards by the rumors, then Apple needs to not only use this chip themselves but also sell this on the open market where they don't have a B2B focus. They have a consumer/"prosumer" focus. It's been a long time since anyone tried putting HBM on a consumer facing device. Maybe Apple could do it given their historically high margins anyway. Maybe they could adopt a hybrid approach a la Nvidia, but going by Nvidia you'd now be talking about $20,000+ at least for a base chip (never mind the rest of the device) - and more if it is pure HBM. Apple has sold devices like that in the past, but not many and it's not clear that they'd have a market there for themselves anymore or want one. So lpDDR may provide all the bandwidth a solution they can actually put on such a device and make it work for their target audience.

The issue here is not that I'm saying "you're wrong" about what a lot of AI chips use, but I have different set of assumptions about what this chip is going to be aimed at and how Apple will use it.

senttoschool · Feb 6, 2025

crazy dave said:
You're making a distinction where there isn't one. The Grace-Hopper Superchip is basically an asymmetric Ultra (a design which Apple is rumored to be moving towards with the M5). It's a single chip. The lpDDR memory may service the CPU side of the super chip but they are part of the unified memory system of the chip and the chip itself is an AI server chip

During inference time, the model is always loaded onto the HBM.

The Grace Hopper CPU/GPU system itself is used for more than inference. It's used for pretraining, training, and inference. Apple would not aim to build a training chip, I would imagine.

Regardless, HBM is still used for Grace Hopper.

crazy dave said:
My impression is that, for classical GPU systems, they link them together to run Llama 405b anyway. Cerebras obviously is different as that's a wafer scale solution in and of itself.

You're going to need to link a ton more chips together if you use LPDDR, at which point, the networking equipment might become the bottleneck. Cerebras does link systems together. It's how they can run a 405b model when a single Cerebras has about 45GB of SRAM. They need to link 5-10 Cerebras chips together at a cost of $15 - $30 million to load a 405b model onto its SRAM, depending on the quant of the model.

crazy dave said:
That last part is the crux, what is Apple using this for and what does Apple actually see its use case being?

I'm going to guess that Apple wants to inference a cutting edge model or near-cutting edge. Otherwise, Apple is going to lose customers and services long-term to better AI systems.

For example, why would I ever care about Apple Intelligence if it's running a 32B model when I can use GPT5 or GPT6 that has a 1 trillion model for $20/month?

Apple would get laughed at, like they're right now with Apple Intelligence. Apple has to make a big leap to catch up.

senttoschool · Feb 6, 2025

leman said:
Bandwidth and compute are both parts of the same equation. If Apple creates new hardware that can deliver more compute, they will also need more bandwidth. Until now, they don't have this technology, so LPDDR works perfectly fine for them.

Until now, they really couldn't use GDDR/DDR/HBM because they're aiming for low power devices. But a modern server rack like the Blackwell rack can use 60-120kW so switching to HBM really isn't consequential in terms of power usage.

The only way I see Apple not using HBM is:

They don't want to run cutting edge models
They can't secure enough HBM supply given that it's all pre-sold out in all of 2025 already
They want to make sure that their server chip is re-used for client devices

crazy dave · Feb 6, 2025

senttoschool said:
During inference time, the model is always loaded onto the HBM.

The Grace Hopper CPU/GPU system itself is used for more than inference. It's used for pretraining, training, and inference. Apple would not aim to build a training chip, I would imagine.

Regardless, HBM is still used for Grace Hopper.

It's also used for AI servers and heterogenous compute. Yes HBM is used ... as is lpDDR ... which is the point. And the larger point is how many individuals and boutique shops buy Grace Hopper?

senttoschool said:
You're going to need to link a ton more chips together if you use LPDDR, at which point, the networking equipment might become the bottleneck. Cerebras does link systems together. It's how they can run a 405b model when a single Cerebras has about 45GB of SRAM. They need to link 5-10 Cerebras chips together at a cost of $15 - $30 million to load a 405b model onto its SRAM, depending on the quant of the model.

You're right Cerebras can be linked together and for this particular case I was thinking more for the bandwidth than for the capacity. But the point people already use large networking to combine multiple-GPU systems and the HBM3 in the GH only gets to 5TB/s and each GPU on Blackwell is about 8 TB/s, but Nvidia sells racks of 36/72 CPU/GPU networked Grace-Blackwell (and can scale up to 288/576 GPUs). So yes with lpDDR Apple would need to network huge numbers of chips together (and is reportedly working with Broadcom on networking), but they'd have to do that with HBM as well. HBM isn't a panacea and that's IF Apple wants to serve this market with their own hardware. I don't think they do. If anything, Apple is likely getting their models optimized on Nvidia chips for a reason: https://developer.nvidia.com/blog/n...urrent-drafting-for-optimizing-llm-inference/. Perhaps this is a case of multiple rods in the fire or the left hand not knowing what the right hand is doing, but I doubt it given the PR splash Apple made with it with coordinated press releases on their end as well Nvidia (which given the history between the two is already significant).

senttoschool said:
I'm going to guess that Apple wants to inference a cutting edge model or near-cutting edge. Otherwise, Apple is going to lose customers and services long-term to better AI systems.

For example, why would I ever care about Apple Intelligence if it's running a 32B model when I can use GPT5 or GPT6 that has a 1 trillion model for $20/month?

Apple would get laughed at, like they're right now with Apple Intelligence. Apple has to make a big leap to catch up.

I have a ... different take on the AI market ... which I don't really want to get into because an entirely different conversation and I'll state that doesn't make me any more impressed with Apple Intelligence, but it also makes me more circumspect about the necessity of Apple competing on the really big iron chips or at the very least spending huge amounts of money developing its own hardware to do so where Apple also can't get multiple uses out of it selling to customers. I could be very wrong of course.

senttoschool said:
Until now, they really couldn't use GDDR/DDR/HBM because they're aiming for low power devices. But a modern server rack like the Blackwell rack can use 60-120kW so switching to HBM really isn't consequential in terms of power usage.

The only way I see Apple not using HBM is:

They don't want to run cutting edge models

They can't secure enough HBM supply given that it's all pre-sold out in all of 2025 already

They want to make sure that their server chip is re-used for client devices

Now we're starting to get closer in agreement. The difference is I think all 3 of your caveats apply.

Also as a side note: DDR and GDDR aren't great for multiple reasons in Apple's designs, not just power although that's a big one, and HBM is fairly low power itself (for the bandwidth). But that bandwidth/capacity is overkill for most Apple use cases and then there's the cost.

Just as thought experiment: there is another possibility beyond normal lpDDR, HBM, and HBM-lpDDR. Apple has been exploring a different lpDDR system which they employed on the R1 in the Vision Pro with a lower power, higher bandwidth connection. Truthfully I'm not sure how scalable that is at it was low capacity and probably a good deal more expensive to package than regular lpDDR RAM, but I suspect more cost effective than HBM, though still not as high bandwidth. Still they were able to double the bandwidth of lpDDR5x and it's just something interesting to think about as a future memory system.

tenthousandthings · Feb 6, 2025

crazy dave said:
I already gave one ... Nvidia Superchips on the CPU side. It depends on what Apple actually wants out of these server chips. …

Do we know what the GB10 (in Project DIGITS) is using? It must also be hybrid, right? We know that the flagship GB200 is:

NVIDIA Blackwell Architecture Technical Overview

NVIDIA's Blackwell GPU architecture revolutionizes AI with unparalleled performance, scalability and efficiency. Anchored by the Grace Blackwell GB200 superchip and GB200 NVL72, it boasts 30X more performance and 25X more energy efficiency over its predecessor.

resources.nvidia.com

These “superchips” are, for now, on the cutting edge of chip-last advanced packaging (CoWoS-L). I find myself thinking that the local silicon interconnect between the two Blackwell chips in the GB200 must be InFO-LSI, just like M1/M2 Ultra. We know CoWoS-L incorporates InFO-LSI (a.k.a. InFO-L).

Can we say that the underlying question here is less about what type of memory Apple will use, and more about what type of advanced packaging? (not rhetorical, I truly don’t know)

Is Hidra competing with the GB10 or the GB200? Or neither?

The imagination tends to run wild when based (as mine is) on pure ignorance, but I find myself picturing Hidra as the next-generation Ultra, containing 2x Max fused via InFO-LSI, incorporated into a CoWoS-L superchip structure with another, third SoC aimed at, as you put it, “what Apple actually wants out of these server chips.”

senttoschool · Feb 6, 2025

crazy dave said:
It's also used for AI servers and heterogenous compute. Yes HBM is used ... as is lpDDR ... which is the point. And the larger point is how many individuals and boutique shops buy Grace Hopper?

The point is that LPDDR is used during pre-training, and training but not inference.

Apple is likely building a chip to do inference, like how they're using the M2 Ultra now.

crazy dave said:
Now we're starting to get closer in agreement. The difference is I think all 3 of your caveats apply.

I think none of them apply and if any do, I think Apple would be in trouble and be left behind in AI.

crazy dave · Feb 6, 2025

tenthousandthings said:
Do we know what the GB10 (in Project DIGITS) is using? It must also be hybrid, right?

DIGITS is pure LPDDR5X. If the information that's out there is accurate it's got 273 GB/s bandwidth on a M4 Pro/Strix Halo like interface 256-bit interface which is the smallest interface you can use while offering 128GB of RAM.

tenthousandthings said:
We know that the flagship GB200 is:

NVIDIA Blackwell Architecture Technical Overview

NVIDIA's Blackwell GPU architecture revolutionizes AI with unparalleled performance, scalability and efficiency. Anchored by the Grace Blackwell GB200 superchip and GB200 NVL72, it boasts 30X more performance and 25X more energy efficiency over its predecessor.

resources.nvidia.com

These “superchips” are, for now, on the cutting edge of chip-last advanced packaging (CoWoS-L). I find myself thinking that the local silicon interconnect between the two Blackwell chips in the GB200 must be InFO-LSI, just like M1/M2 Ultra. We know CoWoS-L incorporates InFO-LSI (a.k.a. InFO-L).

Can we say that the underlying question here is less about what type of memory Apple will use, and more about what type of advanced packaging? (not rhetorical, I truly don’t know)

Is Hidra competing with the GB10 or the GB200? Or neither?

Neither, in between would be my guess. GB10 is more M4 Pro-sized with better sparse matrix performance and higher RAM capacity. While, GB200 is likely way bigger.

tenthousandthings said:
The imagination tends to run wild when based (as mine is) on pure ignorance, but I find myself picturing Hidra as the next-generation Ultra, containing 2x Max fused via InFO-LSI, incorporated into a CoWoS-L superchip structure with another, third SoC aimed at, as you put it, “what Apple actually wants out of these server chips.”

That's what I'm picturing it'll be as well ... or something similar. I'm not sure about the exact configuration of 2xMax/3rd SOC, but yes overall I'm picturing it as a more advanced Ultra design.

senttoschool said:
The point is that LPDDR is used during pre-training, and training but not inference.

Apple is likely building a chip to do inference, like how they're using the M2 Ultra now.

Yes, they're using M2 Ultras now. Think about that.

senttoschool said:
I think none of them apply and if any do, I think Apple would be in trouble and be left behind in AI.

I'm not as convinced by that and again I suspect you and I have very different views on the broader AI market as well as Apple's role in it vs a company like OpenAI. With Apple in particular, in fact I think the dual-use design would be the sweet spot and offer a cost-performance ratio that could actually be quite attractive both for Apple and for Apple's target customer.

Anyway, I think we've laid out our respective positions, we'll just have to see what Apple does.

[Edit] This from Anton is a good primer on why I think Apple may continue to thread the professional/consumer needle on Hidra:

High-bandwidth memory (HBM) options for demanding applications.

HBM is unbeatable in terms of performance, but expensive and power hungry for many applications; We look at the different memory options.

www.embedded.com

“Today they seem to serve different niches and there are broad trends in differences,” said Kanter. “HBM is more datacenter oriented, LPDDR more edge oriented. That being said, there are absolutely folks targeting similar markets using different memory types. Take inference in the datacenter – some designs use HBM, some GDDR, some regular DDR, and some LPDDR.”

To me, where I see Apple's niche is in the LPDDR-style inference devices - even in their data centers because they'll want to bleed their own devices across edge and center (and they want to push as much to edge as possible). Anything else, they can continue to outsource as needed.

Unregistered 4U · Feb 6, 2025

crazy dave said:
My guess, and it is a guess, would be LPDDR actually. Apple get remarkably good bandwidth out of LPDDR, server power is actually still important (although if I remember right HBM power cost is no where near as bad as GDDR per bandwidth), and LPDDR is less expensive both because Apple is already getting in bulk for their other chips and just intrinsically. LPDDR also has much better latency and Apple likes to use the memory for both CPU and GPU.

I’d agree. Just like the world is following Apple to ARM laptops and desktops, they’re also following Apple on unified memory (and, as you say, some already realize that they’re an efficient way to get results. My expectation around Nvidia using HBM is because there’s a lot of code out there that assumes a standard Nvidia system has that, so it can run unmodified.

crazy dave · Feb 6, 2025

Unregistered 4U said:
I’d agree. Just like the world is following Apple to ARM laptops and desktops, they’re also following Apple on unified memory (and, as you say, some already realize that they’re an efficient way to get results. My expectation around Nvidia using HBM is because there’s a lot of code out there that assumes a standard Nvidia system has that, so it can run unmodified.

I don't want to go too far the other direction. HBM is really, really good and most standard Nvidia systems, particularly legacy ones, didn't use it. So big iron AI systems (and other use cases beyond AI too!) have good reason to use HBM. But HBM is also incredibly expensive (amongst other caveats) and future versions like HBM 4 will likely be even more so. Its use vs DDR/LPDDR/GDDR really depends on what you are trying to build and who your ultimate customer is and their needs - i.e. what the source of revenue is for your investment for an actually sustainable business model. My suspicion is that Apple will continue to prioritize consumer facing designs, allowing for flexibility and reuse, even in its data center. That makes sense to me given how Apple uses Apple Intelligence (edge-focused) and what their primary business is (selling devices to customers rather than selling [AI] services - even the separate Apple services that Apple does offer are ultimately still based on selling devices to customers).

senttoschool · Feb 6, 2025

crazy dave said:
Yes, they're using M2 Ultras now. Think about that.

And Apple Intelligence is amazing...? Think about that.

crazy dave · Feb 6, 2025

senttoschool said:
And Apple Intelligence is amazing...? Think about that.

Are any of them? None of the AI super cycle device upgrade predictions came to pass for anyone.

But regardless, no Apple Intelligence isn't amazing, but that isn't because the server-side model in particular and Apple isn't trying to be a leader in that particular area. You're misunderstanding what Apple Intelligence and its goal actually is. They already go to ChatGPT or some other service when they need to outsource some particular large request that they know their own bespoke models can't handle. You're conflating multiple levels of Apple Intelligence together. Again primary purpose is to sell devices using on-device AI - the server-side models are merely there for when on-device is too small and then they outsource models when necessary. That isn't set to change, adopting 100 trillion parameter models in their cloud isn't what they want, they don't sell AI services B2B and they don't sell AI hardware B2B. ChatGPT is no more their core business than is Google search.

senttoschool · Feb 6, 2025

crazy dave said:
Are any of them? None of the AI super cycle device upgrade predictions came to pass for anyone.

Claude, OpenAI, Gemini and a few open source models are amazing.

crazy dave said:
You're conflating multiple levels of Apple Intelligence together. Again primary purpose is to sell devices using on-device AI - the server-side models are merely there for when on-device is too small and then they outsource models when necessary.

Apple is planning on-device inference and their own server inference. It's not just outsourcing the models to OpenAI. They plan to inference their own big models in the server. And if their own models suck compare to OpenAI, no one will care about Apple Intelligence. And guess what OpenAI uses? All HBM-based chips.

theorist9 · Feb 6, 2025

senttoschool said:
At the server level, power requirements don't matter nearly as much as laptops and phones. You can use more power hungry RAM....If it's forgone conclusion that Apple's own custom AI server chip will use HBM....

throAU said:
Servers do need power efficiency!

There’s only so much cooling and power you can fit in a physical rack and you want to fit as much hardware in there as you can because datacenter rack space is expensive!

Hbm actually saves power, it doesn’t increase power consumption!

While I agree power efficiency is critical for servers, I don't believe it's as critical as it is for mobile applications. I.e., I think operations/watt for laptops, tablets, and phones needs to be higher than operations/watt for servers.

Further, I suspect that's one of the reasons (but not the only reason) Apple has stayed with LPDDR since, from what I've read, HBM is more power-hungry. Quoting from a Jan 2024 article by Anton Shilov ( https://www.embedded.com/high-bandwidth-memory-hbm-options-for-demanding-compute/ ) :

"While unbeatable in terms of performance, HBM is expensive and power hungry for many applications, so there are developers that opt to use LPDDR5X for their bandwidth-demanding applications as this type of memory offers them the right balance of price, performance, and power consumption."

Of course, Apple could put HBM in a server chip while staying with LPDDR in its other chips but, at least thus far, to save time and development costs, Apple has maintanined design consistency across all its M-series chips (e.g., they share the same CPU and GPU cores).

crazy dave · Feb 6, 2025

senttoschool said:
Claude, OpenAI, Gemini and a few open source models are amazing.

Amazing is subjective. But even if I were to accept that adjective, are they amazing enough to be actually driving revenue to justify their costs (i.e. make profits)? Did consumers mass upgrade devices to access them? No to both. What does Apple sell?

senttoschool said:
Apple is planning on-device inference and their own server inference. It's not just outsourcing the models to OpenAI.

Yes you are repeating what I just said:

You're conflating multiple levels of Apple Intelligence together. Again primary purpose is to sell devices using on-device AI - the server-side models are merely there for when on-device is too small and then they outsource models when necessary.

It's three levels, but you are saying Apple has to squash that down to two and effectively remove OpenAI in order to not be a failure or even more extreme that the on-device model doesn't matter if they don't have the best server side model running on their own servers making only one level important. I fundamentally disagree. There are two other levels, the actual Apple levels, for a reason.

senttoschool said:
They plan to inference their own big models in the server. And if their own models suck compare to OpenAI, no one will care about Apple Intelligence. And guess what OpenAI uses? All HBM-based chips.

Again, what is Apple's revenue model? What is OpenAI's?

Think of it this way OpenAI or other services is the last level of Apple Intelligence. Apple uses it when they have to, but rapidly sinking billions upon billions of dollars into developing massive bespoke Apple LLMs to compete with the likes of Claude or ChatGPT or Gemini etc ... isn't, so far, what Apple has indicated that it wants to bring to the table. As you yourself said they aren't building their own massive training center to build said model are they? And that makes sense, doing so doesn't leverage their advantages and would expose them to the risks legal and economic while undercutting their privacy claims to Apple user data.

Relying on others to do that and outsourcing (which still undercuts their privacy claims, but gives them some distance - i.e. plausible deniability) lets them access that while they focus on what they want to succeed most at: on-device. Yes they also develop their own bespoke medium sized models for their own servers to soak up requests that they know cannot yet be done on-device but don't require something like OpenAI/cutting edge models. This edge-device and server-middle-ground may be less sexy right now but ultimately may be more impactful. Whether or not that pans out is of course up in the air, but, so far, that appears to be Apple's public strategy here* and it has merit.

*I have to add all these caveats (appears, public) because obviously I am neither sitting in Apple's boardroom nor even in an engineering cubicle. This is what makes sense to me from the outside based on what we can see Apple doing, their own statements thereof, and the general trends in Apple's business strategy for the last couple of decades.

theorist9 said:
While I agree power efficiency is critical for servers, I don't believe it's as critical as it is for mobile applications. I.e., I think operations/watt for laptops, tablets, and phones needs to be higher than operations/watt for servers.

Further, I suspect that's one of the reasons (but not the only reason) Apple has stayed with LPDDR since, from what I've read, HBM is more power-hungry. Quoting from a Jan 2024 article by Anton Shilov ( https://www.embedded.com/high-bandwidth-memory-hbm-options-for-demanding-compute/ ) :

"While unbeatable in terms of performance, HBM is expensive and power hungry for many applications, so there are developers that opt to use LPDDR5X for their bandwidth-demanding applications as this type of memory offers them the right balance of price, performance, and power consumption."

Of course, Apple could put HBM in a server chip while staying with LPDDR in its other chips but, at least thus far, to save time and development costs, Apple has maintanined design consistency across all its M-series chips (e.g., they share the same CPU and GPU cores).

HBM may be more power hungry overall than LPDDR but as the article states it's not necessarily so as a function of bandwidth. But its bandwidth and power doesn't really make sense for most applications that Apple competes in - 8GB of HBM simply wouldn't make sense on a phone even if the power was the same or less than LPDDR - it still doesn't make sense even for Max or Ultra. It would be way too expensive and the bandwidth (I'm not sure what that would be since bandwidth is a function of stack size/number) would likely be overkill and I'm not even sure what the latency would be relative to LPDDR. But the expense would make even Apple's LPDDR packaging look like small potatoes.

With the exception of some Navi products that AMD has never attempted again, no one uses relatively small amounts of HBM in a consumer product - and arguably AMD's lack of trying since further proves it just isn't a workable idea, certainly not a mass market one. HBM's best use is when it's deployed en masse on high end data center chips. This may raise the absolute power cost but it probably delivers bandwidth at better power rates - i.e. if LPDDR could deliver the same bandwidth as HBM3/e (and it possibly could if you just kept adding memory busses, I'm not sure what the limit is if there is one) its power costs would likely be as high if not higher whilst the advantage of LPDDR is for a wide range of capacities you can still get really high bandwidth at low power for cheaper than HBM. This allows you to build mass market devices with lots of bandwidth and low power.

leman · Feb 7, 2025

I think another point to consider is that performance difference between HBM and LPDDR isn’t actually that large. After all, the former is basically stacked DDR with a very wide interface. If you’d take the fastest LPDDR5X and gave it 6000+ bits of bus, it would perform very similarly to HBM. It really be ones about costs and engineering preferences. HBM is probably more compact if you go super wide. Then again, Apples LPDDR modules already can be described as HBM lite.

Would Apple's custom AI server chip use HBM or stick with LPDDR?

macrumors 68030

macrumors 68000

macrumors G4

macrumors 68030

macrumors 68030

macrumors 68000

macrumors 68030

macrumors 68030

macrumors Core

Suspended

macrumors 68000

macrumors 68030

macrumors 68030

macrumors 68000

Contributor

macrumors 68030

macrumors 68000

macrumors G4

macrumors 68000

macrumors 68030

macrumors 68000

macrumors 68030

macrumors 601

macrumors 68000

macrumors Core

Our Staff