Nvidia leading in server chips? Not really. AI/ML training card that plug into a server being run by a server chip, yes Nvidia is dominant (but the field has multiple players) . But the main processor in the server, Nvidia is both relatively late to the game and clearly not the only player. Ampere, Amazon , Microsoft , Google. etc. Are all players. Arm has a very viable server core in the Neoverse family. Nvidia is using. Amazon, Microsoft, other hyperscalers are using it. Ampere Computing was/is using it ( transitioning to a custom core for future generation. That may or may not work out for them if all of their major customers just keeping buying Arm's version. )
In AI/ML inference, Nvidia absolute does
not have exclusive hold on the market. Inference and training do
not have to be done on the same hardware.
MobileEye does AI/ML inference to help automatic car safety features. ( millions of cars with no Nvidia ).
If peak minimal latency is required several inference workloads run solely on the CPU if possible. ( copying the data out to the Nvidia card and back is time. ). One reason why Intel has thrown highly skewed AVX-512 and "DL boost' at the Xeon SP processors to try to backstop some of there competitive losses in server space.
Similarly.
https://www.tomshardware.com/tech-i...-supply-dollar752-million-in-ai-chips-instead
Similarly,
" ... probably setting the stage for what we are calling the AmpereOne-3 chip, which is our name for it and which is etched in 3 nanometer (3N to be precise) processes from TSMC. We think this will be using a modified A2+ core. Wittich confirmed to us that a future AmpereOne chip was in fact using the 3N process and was at TSMC right now being etched as it moves towards its launch. And then he told us that this future chip would have 256 cores. He did not get into chiplet architectures, but did say that Ampere Computing was using the UCI-Express in-socket variant of PCI-Express as its chiplet interconnect for future designs. ...
...
There are a lot of possibilities, but we know one thing: Ampere Computing is hell bent on capturing as much AI inference on its CPUs as is technically feasible.
... "
How many cores is enough for server CPUs? All that we can get, and then some. For the past two decades, the game in compute engines has been to try to
www.nextplatform.com
With an UCI-express inference if Apple wanted to put some of their NPUs on a chiplet (also with a UCI-e interface) and package it together for some custom inference Apple wouldn't have to build a whole server chip and some narrow custom mods to software to invoke the accelerator to offload perhaps more custom inference workloads.
( Arm is using on Neoverse with UCI-express also. )
By second half 2025, Ampere Computing could be on their second generation N3 Arm server chip aimed at inference.
( I'm a bit skeptical they will keep that yearly cadence. )
Finally on the inference front, Google is rolling out Gemini Nano. Apple is doing tons of AI inference in the Vision PRo. The whole Apple line up is reported to be doing more local inference in next versions of iOS/iPadOS/macOS. That is 100's of
millions of devices where there is zero Nvidia in sight. Nvidia having AI/ML inference in some kind of unilateral monopoly hold is a complete farce. The AI/ML inference market is far, far , far bigger than the 'largest memory footprint possible', LLM model.