Apple is falling behind, M4 Max nowhere close to RTX 5090.

diamond.g · Aug 27, 2025

leman said:
That is actually rather impressive. They managed to fit a 5070 equivalent in that SoC. I wonder what the CPU performance is.

That is a wild comparison since it is like half a GB100, which is a "different" spin on Blackwell compared to what is in the GB200 line.

maflynn · Aug 27, 2025

TechnoMonk said:
but problem is it can’t be used with GPU. I have no appetite to spend money. if RAM can’t be used for my GPU tasks.

Given that the threadripper is not an apu so why would you want to use system memory in the first place?

All things being equal, I think DDR5 is slower in bandwidth then GDDR7, so tasks leveraging the Cuda processing you'll better off relying on vram. DDR 5 has a bandwidth around 4.8 to 8.8 Gbps where as GDDR7 has a bandwidth in the neighborhood of 32–36 Gbbps,

OptimusGrime · Aug 27, 2025

maflynn said:
Given that the threadripper is not an apu so why would you want to use system memory in the first place?

Because it avoids copying

maflynn said:
All things being equal, I think DDR5 is slower in bandwidth then GDDR7,

Yes, but latency is lower on DDR5.

TechnoMonk · Aug 27, 2025

leman said:
That is actually rather impressive. They managed to fit a 5070 equivalent in that SoC. I wonder what the CPU performance is.

The CPU is my concern. But it is impressive GPU with unified memory.

maflynn said:
Given that the threadripper is not an apu so why would you want to use system memory in the first place?

All things being equal, I think DDR5 is slower in bandwidth then GDDR7, so tasks leveraging the Cuda processing you'll better off relying on vram. DDR 5 has a bandwidth around 4.8 to 8.8 Gbps where as GDDR7 has a bandwidth in the neighborhood of 32–36 Gbbps,

I need unified memory, I consistently use system memory too. I know LLM use GPU exclusively but in case of large datasets, it’s not uncommon to use system memory for i/o and other compute.

maflynn · Aug 27, 2025

TechnoMonk said:
I need unified memory, I consistently use system memory too. I know LLM use GPU exclusively but in case of large datasets, it’s not uncommon to use system memory for i/o and other compute.

Ok, so help me connect the dots, I don't use LLMs, so I'm speaking from a position of ignorance.

For the sake of argument, lets say the thread ripper uses system memory for both GPU computations and cpu computations, its running at DDR5 speeds. Wouldn't that be slower then a threadripper using GDDR7 for LLM processing and the system ram for non-LLM tasks (I'm assuming the GPU is a 5090)?

Apple silicon has memory on the Soc, so its an apple/oranges comparison. The memory bandwidth on the M3 Ultra is 819 Gbps, and up to 546 Gbs for the M4 Max. This is where Apple's architecture shines, so if you want unified memory, or high memory bandwidth Apple Silicon is really your only option.

One of the biggest gripes of intel processors with iGPUs for years was the fact it relied on system memory that was significantly slower then vram. What you're proposing is a return to that slow inefficient approach

Another option is to find a desktop with a Ryzen AI Max+ 395 While it uses unified memory, the max amount for the gpu cores is 96GB, its a close approximation of how apple has its chips

throAU · Aug 27, 2025

maflynn said:
Given that the threadripper is not an apu so why would you want to use system memory in the first place?

All things being equal, I think DDR5 is slower in bandwidth then GDDR7, so tasks leveraging the Cuda processing you'll better off relying on vram. DDR 5 has a bandwidth around 4.8 to 8.8 Gbps where as GDDR7 has a bandwidth in the neighborhood of 32–36 Gbbps,

Does make you think though - what if AMD put a few GPU dies on thread ripper, or even better, EPYC. While DDR5 is slower than GDDR7, its probably a lot closer if you have 8-16 memory channels of it

quarkysg · Aug 27, 2025

maflynn said:
Another option is to find a desktop with a Ryzen AI Max+ 395 While it uses unified memory, the max amount for the gpu cores is 96GB, its a close approximation of how apple has its chips

I may be wrong, but Ryzen AI Max+ 395 supports up to 4 LPDDR5X channels, which totals 128-bits of memory bus. This is the same as a base M4. Pro doubles the base SoC bandwidth and Max doubles the Pro’s bandwidth, and if I’m not wrong both M4 and Ryzen uses LPDDR5X tho. Ryzen may have support for the highest memory clocks.

Ryzen’s GPU clock is close to 3GHz so it has an edge over M4/Pro/Max GPU cores, but not sure if they will be bandwidth starved tho.

diamond.g · Aug 27, 2025

throAU said:
Does make you think though - what if AMD put a few GPU dies on thread ripper, or even better, EPYC. While DDR5 is slower than GDDR7, its probably a lot closer if you have 8-16 memory channels of it

Why not go with HBM3e instead of DDR?

leman · Aug 27, 2025

maflynn said:
For the sake of argument, lets say the thread ripper uses system memory for both GPU computations and cpu computations, its running at DDR5 speeds. Wouldn't that be slower then a threadripper using GDDR7 for LLM processing and the system ram for non-LLM tasks (I'm assuming the GPU is a 5090)?

DDR5 is slower than GDDR7 per pin, but one also needs to look at the interface width and the memory clock. For example, higher-end threadripper platforms feature 8 DDR5 channels (512 bits in total), which is around 400GB/s.

In general, yes, GDDR used in gaming GPUs will have a higher bandwidth because it runs at very high frequency. The power consumption is something else too. For Spark, Nvidia uses a 256-bit wide memory interface running at high clock (I suppose they can ship it since it's a low volume product).

maflynn said:
Apple silicon has memory on the Soc, so its an apple/oranges comparison. The memory bandwidth on the M3 Ultra is 819 Gbps, and up to 546 Gbs for the M4 Max. This is where Apple's architecture shines, so if you want unified memory, or high memory bandwidth Apple Silicon is really your only option.

RAM is not on SoC, but it's close to SoC. Design-wise, it's a similar solution to what GPUs use, only Apple uses mobile RAM standard instead of GDDR. Apple's RAM is relatively fast because they use very wide (expensive!) RAM interfaces and high-performance RAM modules. But it has nothing to do with using an SoC. That's the same fundamental tech that every laptop was using for over a decade (albeit fine-tuned and optimized to the extreme).

maflynn said:
One of the biggest gripes of intel processors with iGPUs for years was the fact it relied on system memory that was significantly slower then vram. What you're proposing is a return to that slow inefficient approach

System RAM does not have to be slow. Intel iGPUs were slow because they optimized for cost, not performance. Wide memory interfaces are very expensive. For example, Nvidia has been cutting down the memory bus for many years now — only the ultra-high-end desktop 5090 RTX ships with a 512 bit interface. Apple is the only current vendor willing to throw $$$ at the problem, and they can afford it because of their high product price.

TechnoMonk · Aug 27, 2025

diamond.g said:
In your use case memory capacity is more important then bandwidth or GPU speed? It looks like all the PC SoCs are limited to 128GB, which doesn't seem like enough.

I can go higher with AMD EPYC. I am ok with 128 GB, just need unified memory. Hopefully

maflynn said:
Ok, so help me connect the dots, I don't use LLMs, so I'm speaking from a position of ignorance.

For the sake of argument, lets say the thread ripper uses system memory for both GPU computations and cpu computations, its running at DDR5 speeds. Wouldn't that be slower then a threadripper using GDDR7 for LLM processing and the system ram for non-LLM tasks (I'm assuming the GPU is a 5090)?

Apple silicon has memory on the Soc, so its an apple/oranges comparison. The memory bandwidth on the M3 Ultra is 819 Gbps, and up to 546 Gbs for the M4 Max. This is where Apple's architecture shines, so if you want unified memory, or high memory bandwidth Apple Silicon is really your only option.

One of the biggest gripes of intel processors with iGPUs for years was the fact it relied on system memory that was significantly slower then vram. What you're proposing is a return to that slow inefficient approach

Another option is to find a desktop with a Ryzen AI Max+ 395 While it uses unified memory, the max amount for the gpu cores is 96GB, its a close approximation of how apple has its chips

i didn’t propose anything. Not sure where you got that idea from my post. VRAM is faster but in a workload, the GPU may be only 40-50%. Your I/O, CPU will have a part. I do wish pipelines are 100% GPU and everything fits in VRAM. All the benchmarks, specs are fun and games till you start running pipelines, and you realize it’s a part of the puzzle. Unified memory shines there but Apple needs to beef up the GPU.

throAU · Aug 27, 2025

diamond.g said:
Why not go with HBM3e instead of DDR?

Cost

diamond.g · Aug 27, 2025

throAU said:
Cost

That is fair. And I guess higher latency can be a problem as well.

Boil · Aug 27, 2025

Apple needs to move to a chiplet design for their high-end desktop/workstation models...

MX Extreme (2nm)
64 CPU cores (48P/16E)
1,024 GPU cores
256 Neural Engine cores
1.92TB LPDDR6 ECC RAM
2.16TB/s UMA bandwidth

OptimusGrime · Aug 27, 2025

Boil said:
Apple needs to move to a chiplet design for their high-end desktop/workstation models...

MX Extreme (2nm)

64 CPU cores (48P/16E)

1,024 GPU cores

256 Neural Engine cores

1.92TB LPDDR6 ECC RAM

2.16TB/s UMA bandwidth

Why only 1024 gpu cores? Only playing mobile games?

splifingate · Aug 27, 2025

leman said:
That is actually rather impressive. They managed to fit a 5070 equivalent in that SoC. I wonder what the CPU performance is.

I'll try to let you know in a few months

splifingate · Aug 27, 2025

throAU said:
what if AMD put a few GPU dies on thread ripper, or even better, EPYC

My EPYC 4565P has GPU dies (or something akin to such).

I don't need a GPU to drive a display from my H13SAE-MF ... 🤷‍♂️

TechnoMonk · Aug 27, 2025

splifingate said:
I'll try to let you know in a few months

I am curious too. I am planning to replace my workstation next year.

splifingate · Aug 27, 2025

TechnoMonk said:
I am curious too. I am planning to replace my workstation next year.

I started with the premise (last year) that I would soon enjoy an understanding of just what AI inference, learning and modeling was all-about.

Of course, I purchased some ***** from Teh Bay, and was burned (multiple times).

In the Process, I secured some semi-serious hardware. My latest hope is that I can secure a decent GPU that facilitates the intended learning, and doesn't help me regret my latest purchases...

IDRGAS whether the DGX Spark Duo I reserved will hold a crown.

What I most desire is a platform where I can experience, learn and understand.

Is this too much to ask?

throAU · Aug 27, 2025

splifingate said:
My EPYC 4565P has GPU dies (or something akin to such).

I don't need a GPU to drive a display from my H13SAE-MF ... 🤷‍♂️

Ah I meant a significant quantity of them. I think the GPU cores on EPYC and Ryzen are in the IO die at the moment? on Threadripper/EPYC they could maybe replace some of the many CPU dies with GPU dies and build something with a heap of unified memory for AI.

leman · Aug 28, 2025

It seems that a lot of folks in this thread are purchasing a GB10 workstation. Do we have so many ML professionals working with large models here?

maflynn · Aug 28, 2025

leman said:
Do we have so many ML professionals working with large models here?

That's what I'm wondering.

I was (am) tempted by the GB10, but truth be told, I really don't have a use case for it. It would largely be sitting there after the initial glow of buying new wore off.

pshufd · Aug 28, 2025

maflynn said:
That's what I'm wondering.

I was (am) tempted by the GB10, but truth be told, I really don't have a use case for it. It would largely be sitting there after the initial glow of buying new wore off.

I see more of them in the r/macstudio than here.

Some people are playing around with them, I guess to enhance job marketability; some do it for work and some for research.

leman · Aug 28, 2025

maflynn said:
That's what I'm wondering.

I was (am) tempted by the GB10, but truth be told, I really don't have a use case for it. It would largely be sitting there after the initial glow of buying new wore off.

That's quite an expensive toy. I don't expect it to be of much use as a general-purpose computer.

maflynn · Aug 28, 2025

leman said:
That's quite an expensive toy. I don't expect it to be of much use as a general-purpose computer.

Agreed, not something I'm willing to spend, but it does make those mentioning their plans to get one, all the more curious.

Ursadorable · Aug 28, 2025

I'd like a faster GPU as well, but let's be realistic in your comparisons. One is a CPU with an integrated GPU, the other is just a GPU. Of course a dedicated GPU is going to outperform the CPU. If you're going try and do a fair comparison, then compare apples to apples. Compare the M4 to any AMD or Intel CPU with integrated graphics.

Apple is falling behind, M4 Max nowhere close to RTX 5090.

macrumors G5

macrumors Broadwell

macrumors 6502

macrumors 68040

macrumors Broadwell

macrumors G4

macrumors 65816

macrumors G5

macrumors Core

macrumors 68040

macrumors G4

macrumors G5

macrumors 68040

macrumors 6502

Contributor

Contributor

macrumors 68040

Contributor

macrumors G4

macrumors Core

macrumors Broadwell

macrumors G4

macrumors Core

macrumors Broadwell

macrumors 6502a

Our Staff