Claude, OpenAI, Gemini and a few open source models are amazing.
Amazing is subjective. But even if I were to accept that adjective, are they amazing enough to be actually driving revenue to justify their costs (i.e. make profits)? Did consumers mass upgrade devices to access them? No to both. What does Apple sell?
Apple is planning on-device inference and their own server inference. It's not just outsourcing the models to OpenAI.
Yes you are repeating what I just said:
You're conflating multiple levels of Apple Intelligence together. Again primary purpose is to sell devices using on-device AI - the server-side models are merely there for when on-device is too small and then they outsource models when necessary.
It's three levels, but you are saying Apple has to squash that down to two and effectively remove OpenAI in order to not be a failure or even more extreme that the on-device model doesn't matter if they don't have the best server side model running on their own servers making only one level important. I fundamentally disagree. There are two other levels, the actual Apple levels, for a reason.
They plan to inference their own big models in the server. And if their own models suck compare to OpenAI, no one will care about Apple Intelligence. And guess what OpenAI uses? All HBM-based chips.
Again, what is Apple's revenue model? What is OpenAI's?
Think of it this way OpenAI or other services is the last level of Apple Intelligence. Apple uses it when they have to, but rapidly sinking billions upon billions of dollars into developing massive bespoke Apple LLMs to compete with the likes of Claude or ChatGPT or Gemini etc ... isn't, so far, what Apple has indicated that it wants to bring to the table. As you yourself said they aren't building their own massive training center to build said model are they? And that makes sense, doing so doesn't leverage their advantages and would expose them to the risks legal and economic while undercutting their privacy claims to Apple user data.
Relying on others to do that and outsourcing (which still undercuts their privacy claims, but gives them some distance - i.e. plausible deniability) lets them access that while they focus on what they want to succeed most at: on-device. Yes they also develop their own bespoke medium sized models for their own servers to soak up requests that they know cannot yet be done on-device but don't require something like OpenAI/cutting edge models. This edge-device and server-middle-ground may be less sexy right now but ultimately may be more impactful. Whether or not that pans out is of course up in the air, but, so far, that appears to be Apple's public strategy here* and it has merit.
*I have to add all these caveats (
appears,
public) because obviously I am neither sitting in Apple's boardroom nor even in an engineering cubicle. This is what makes sense to me from the outside based on what we can see Apple doing, their own statements thereof, and the general trends in Apple's business strategy for the last couple of decades.
While I agree power efficiency is critical for servers, I don't believe it's as critical as it is for mobile applications. I.e., I think operations/watt for laptops, tablets, and phones needs to be higher than operations/watt for servers.
Further, I suspect that's one of the reasons (but not the only reason) Apple has stayed with LPDDR since, from what I've read, HBM is more power-hungry. Quoting from a Jan 2024 article by Anton Shilov (
https://www.embedded.com/high-bandwidth-memory-hbm-options-for-demanding-compute/ ) :
"While unbeatable in terms of performance, HBM is expensive and power hungry for many applications, so there are developers that opt to use LPDDR5X for their bandwidth-demanding applications as this type of memory offers them the right balance of price, performance, and power consumption."
Of course, Apple could put HBM in a server chip while staying with LPDDR in its other chips but, at least thus far, to save time and development costs, Apple has maintanined design consistency across all its M-series chips (e.g., they share the same CPU and GPU cores).
HBM may be more power hungry overall than LPDDR but as the article states it's not necessarily so as a function of bandwidth. But its bandwidth and power doesn't really make sense for most applications that Apple competes in - 8GB of HBM simply wouldn't make sense on a phone even if the power was the same or less than LPDDR - it still doesn't make sense even for Max or Ultra. It would be way too expensive and the bandwidth (I'm not sure what that would be since bandwidth is a function of stack size/number) would likely be overkill and I'm not even sure what the latency would be relative to LPDDR. But the expense would make even Apple's LPDDR packaging look like small potatoes.
With the exception of some Navi products that AMD has never attempted again, no one uses relatively small amounts of HBM in a consumer product - and arguably AMD's lack of trying since further proves it just isn't a workable idea, certainly not a mass market one. HBM's best use is when it's deployed en masse on high end data center chips. This may raise the absolute power cost but it probably delivers bandwidth at better power rates - i.e. if LPDDR could deliver the same bandwidth as HBM3/e (and it possibly could if you just kept adding memory busses, I'm not sure what the limit is if there is one) its power costs would likely be as high if not higher whilst the advantage of LPDDR is for a wide range of capacities you can still get really high bandwidth at low power for cheaper than HBM. This allows you to build mass market devices with lots of bandwidth and low power.