Everything currently available on the consumer market is very underpowered. Nvidia is set to launch the first dedicated AI accelerators this year. This will be the Nvidia Spark for the consumer market. The maximum AI computing power will be 1 Petaflop...
Everything else on the consumer market is, for now, just a gimmick. Note also that Nvidia and ATi are putting new graphics card launches on hold for two years. Apple is also standing still in this regard, unable to exceed the performance of the RX 6900XT... But in two or perhaps three years’ time, we’ll see a new generation of AI-capable graphics cards at a price accessible to almost everyone 😃
The RTX Spark is a consumer DGX Spark and Apple's current lineup already beats it at a lot of workloads especially because of how the CPU cores perform. I personally don't have a ton of use for FP4 which is where that 'petaflop' comes from, some of the work I do needs FP16, FP32, or even FP64 which I have to fall back to the CPU for but thanks to Apple's memory bandwidth it is workable for certain things.
For CUDA optimized workloads at very low precision it will be
okay but I doubt they will even match the M4 Max memory bandwidth with the Spark, the charts I read have it coming in at around 300GB/s which is not ideal.
Apple Silicon is not going to replace a server rack of Blackwells anytime soon but at the high consumer end they are very competitive for AI work that goes beyond 'let me run an openclaw machine or experiment with the CUDA toolchain'. If anything I think the DGX Spark is a weirdly positioned machine and outside of the Linkedin influencer posts about 'I just got it and set up an agent!' which anyone with an M4 Mac Mini can do I haven't seen anything groundbreaking from anyone who has one.
I'd take one for free or at a hefty subsidy or if I was locked into the CUDA ecosystem but blissfully I am not, and I have a 5090 that I do CUDA work on when I do need to power through something specific. BF16 is also a lot more useful than (NV)FP4.
And for apples-to-apples comparison's sake: the 5090 does
3.35PFLOPs at FP4 sparsity, which is where nvidia gets the 'one petaflop supercomputer!' for the DGX Spark and RTX Spark from in their somewhat misleading marketing. Yes you're getting a unified memory machine that can run larger models if you shell out for it (the DGX Spark went up in price to $4799 now), but I'd be hard-pressed to choose that strange ecosystem over Apple's at current prices especially since the M5 Architecture improved the matrix math so much which helps TTFT a lot and was a bottleneck on previous generations.
Caveat being of course, Apple's current prices for new models are going to go up quite a bit, so who knows where things will land then.