Apple Silicon deep learning performance

thenewperson · Jun 24, 2026

bombardier10 said:
Everything currently available on the consumer market is very underpowered. Nvidia is set to launch the first dedicated AI accelerators this year. This will be the Nvidia Spark for the consumer market. The maximum AI computing power will be 1 Petaflop...
Everything else on the consumer market is, for now, just a gimmick. Note also that Nvidia and ATi are putting new graphics card launches on hold for two years. Apple is also standing still in this regard, unable to exceed the performance of the RX 6900XT... But in two or perhaps three years’ time, we’ll see a new generation of AI-capable graphics cards at a price accessible to almost everyone

Ok

novagamer · Jun 24, 2026

bombardier10 said:
Everything currently available on the consumer market is very underpowered. Nvidia is set to launch the first dedicated AI accelerators this year. This will be the Nvidia Spark for the consumer market. The maximum AI computing power will be 1 Petaflop...
Everything else on the consumer market is, for now, just a gimmick. Note also that Nvidia and ATi are putting new graphics card launches on hold for two years. Apple is also standing still in this regard, unable to exceed the performance of the RX 6900XT... But in two or perhaps three years’ time, we’ll see a new generation of AI-capable graphics cards at a price accessible to almost everyone 😃

The RTX Spark is a consumer DGX Spark and Apple's current lineup already beats it at a lot of workloads especially because of how the CPU cores perform. I personally don't have a ton of use for FP4 which is where that 'petaflop' comes from, some of the work I do needs FP16, FP32, or even FP64 which I have to fall back to the CPU for but thanks to Apple's memory bandwidth it is workable for certain things.

For CUDA optimized workloads at very low precision it will be okay but I doubt they will even match the M4 Max memory bandwidth with the Spark, the charts I read have it coming in at around 300GB/s which is not ideal.

Apple Silicon is not going to replace a server rack of Blackwells anytime soon but at the high consumer end they are very competitive for AI work that goes beyond 'let me run an openclaw machine or experiment with the CUDA toolchain'. If anything I think the DGX Spark is a weirdly positioned machine and outside of the Linkedin influencer posts about 'I just got it and set up an agent!' which anyone with an M4 Mac Mini can do I haven't seen anything groundbreaking from anyone who has one.

I'd take one for free or at a hefty subsidy or if I was locked into the CUDA ecosystem but blissfully I am not, and I have a 5090 that I do CUDA work on when I do need to power through something specific. BF16 is also a lot more useful than (NV)FP4.

And for apples-to-apples comparison's sake: the 5090 does 3.35PFLOPs at FP4 sparsity, which is where nvidia gets the 'one petaflop supercomputer!' for the DGX Spark and RTX Spark from in their somewhat misleading marketing. Yes you're getting a unified memory machine that can run larger models if you shell out for it (the DGX Spark went up in price to $4799 now), but I'd be hard-pressed to choose that strange ecosystem over Apple's at current prices especially since the M5 Architecture improved the matrix math so much which helps TTFT a lot and was a bottleneck on previous generations.

Caveat being of course, Apple's current prices for new models are going to go up quite a bit, so who knows where things will land then.

leman · Jun 24, 2026

bombardier10 said:
Everything currently available on the consumer market is very underpowered. Nvidia is set to launch the first dedicated AI accelerators this year. This will be the Nvidia Spark for the consumer market. The maximum AI computing power will be 1 Petaflop...
Everything else on the consumer market is, for now, just a gimmick. Note also that Nvidia and ATi are putting new graphics card launches on hold for two years. Apple is also standing still in this regard, unable to exceed the performance of the RX 6900XT... But in two or perhaps three years’ time, we’ll see a new generation of AI-capable graphics cards at a price accessible to almost everyone 😃

Spark has been available for some time now, it's essentially an RTX 5070. The 1000 TFLOPS quoted is for peak throughout FP4 with sparsity — you won't get anywhere close to thee figures in practice. The M5 Max should be quite competitive with RTX 5070 for most machine learning applications unless we are looking at models that are highly optimized for Nvidia's tensor cores.

name99 · Jun 24, 2026

bombardier10 said:
Everything currently available on the consumer market is very underpowered. Nvidia is set to launch the first dedicated AI accelerators this year. This will be the Nvidia Spark for the consumer market. The maximum AI computing power will be 1 Petaflop...
Everything else on the consumer market is, for now, just a gimmick. Note also that Nvidia and ATi are putting new graphics card launches on hold for two years. Apple is also standing still in this regard, unable to exceed the performance of the RX 6900XT... But in two or perhaps three years’ time, we’ll see a new generation of AI-capable graphics cards at a price accessible to almost everyone 😃

Bandwidth of the DGX Spark?...
Yeah, I thought so...
Amateurs talk FLOPs, professionals talk bandwidth.

impulse462 · Jun 25, 2026

name99 said:
Bandwidth of the DGX Spark?...
Yeah, I thought so...
Amateurs talk FLOPs, professionals talk bandwidth.

Professionals for what?

The true answer is to roofline your code to see whether your algorithm is compute or memory bound and go from there

leman · Jun 25, 2026

impulse462 said:
Professionals for what?

The true answer is to roofline your code to see whether your algorithm is compute or memory bound and go from there

LLMs are pretty much always memory bound. For example, an 8B parameter model requires at least 4-5GB even with high weight compression.

impulse462 · Jun 25, 2026

leman said:
LLMs are pretty much always memory bound. For example, an 8B parameter model requires at least 4-5GB even with high weight compression.

Sure. i was talking about more in general but then i saw the title of the thread.

bombardier10 · Jun 25, 2026

Try to understand the differences. Nvidia Spark achieves AI performance of 1,000 TFlops, whilst the fastest Mac computer manages 70 TFlops. There’s no comparison here. This isn’t an RX5070, let alone an M5 Max. In practice, projects that would normally take hours to generate will be completed in minutes 😃

OptimusGrime · Jun 25, 2026

I too enjoy posts which avoid the substantive points raised, and instead focus on repeating the same statements as a means of avoidance.

Satisfying.

novagamer · Jun 25, 2026

bombardier10 said:
Try to understand the differences. Nvidia Spark achieves AI performance of 1,000 TFlops, whilst the fastest Mac computer manages 70 TFlops. There’s no comparison here. This isn’t an RX5070, let alone an M5 Max. In practice, projects that would normally take hours to generate will be completed in minutes 😃

Do you understand what FP4 is? Because if not, you should probably go do a few months of reading if this is an area that interests you. Genuine suggestion.

Multiple posts cleanly refuted your assertion here with facts. If you're trolling us, that's frowned upon here especially in this area of the forum.

leman · Jun 25, 2026

bombardier10 said:
Try to understand the differences.

Before you can point out differences to others, it helps to understand the fundamentals yourself.

bombardier10 said:
This isn’t an RX5070, let alone an M5 Max.

This is literally a RTX 5070

table 6, https://images.nvidia.com/aem-dam/S...ell/nvidia-rtx-blackwell-gpu-architecture.pdf

bombardier10 said:
In practice, projects that would normally take hours to generate will be completed in minutes 😃

You are a bit out of your depth here, aren’t yiu?

bombardier10 · Jun 26, 2026

I’m more interested in specific applications. Does anyone know how long it takes to
upscaling SD video to 4K in Topaz AI using a Mac M5 Max?
I’m talking about clips that are an hour long or longer.

MrGunny94 · Jun 26, 2026

Has anyone been using LM Studio anything similar for local inference? Wanna leverage some of it with my 48GB. Mainly used Claude Code and Codex so far...

theorist9 · Jun 27, 2026

Since we're discussing Spark, it's useful to hear from actual Spark users.

Here's a thread on the NVIDIA Developer Forum. There's a broad consensus that its 273 GB/s memory bandwidth is a significant limitation.

Spark II - Let's discuss what we would like to see in the next Spark version

Personally, I would love to see in v2: * Rubin platform. * Multiple options of up to 1TB of RAM, allowing plenty space to run the largest of models locally. * Better heat management. * Full memory bandwidth, maybe 25% of Rubin platform at least? 273 GB/s is just painful. * NemoClaw baked in.

forums.developer.nvidia.com

Xiao_Xi · Jun 30, 2026

In case anyone is interested in a 300-page document about Apple Neural Engine.

Apple Neural Engine: Architecture, Programming, and Performance

The Apple Neural Engine (ANE) is the fixed-function matrix accelerator that has shipped in Apple systems-on-chip since the A11-class iPhone and iPad chips and the M1-class Mac chips, exposed to applications only through the Core ML model framework. This guide reports a reverse-engineered account...

arxiv.org

It should be noted that some in the HN publication believe it was AI-written.

theorist9 · Jun 30, 2026

Xiao_Xi said:
In case anyone is interested in a 300-page document about Apple Neural Engine....

It should be noted that some in the HN publication believe it was AI-written.

Is that ironic or appropriate? 😀

Search

Search

Apple Silicon deep learning performance

thenewperson

macrumors 65816

novagamer

macrumors 6502a

leman

macrumors Core

name99

macrumors 68030

impulse462

macrumors 68020

leman

macrumors Core

impulse462

macrumors 68020

bombardier10

macrumors regular

OptimusGrime

macrumors 6502

novagamer

macrumors 6502a

leman

macrumors Core

bombardier10

macrumors regular

MrGunny94

macrumors 65816

theorist9

macrumors 601

Spark II - Let's discuss what we would like to see in the next Spark version

Xiao_Xi

macrumors 68000

Apple Neural Engine: Architecture, Programming, and Performance

theorist9

macrumors 601

Our Staff