Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
That is actually rather impressive. They managed to fit a 5070 equivalent in that SoC. I wonder what the CPU performance is.
That is a wild comparison since it is like half a GB100, which is a "different" spin on Blackwell compared to what is in the GB200 line.
 
but problem is it can’t be used with GPU. I have no appetite to spend money. if RAM can’t be used for my GPU tasks.
Given that the threadripper is not an apu so why would you want to use system memory in the first place?

All things being equal, I think DDR5 is slower in bandwidth then GDDR7, so tasks leveraging the Cuda processing you'll better off relying on vram. DDR 5 has a bandwidth around 4.8 to 8.8 Gbps where as GDDR7 has a bandwidth in the neighborhood of 32–36 Gbbps,
 
That is actually rather impressive. They managed to fit a 5070 equivalent in that SoC. I wonder what the CPU performance is.
The CPU is my concern. But it is impressive GPU with unified memory.
Given that the threadripper is not an apu so why would you want to use system memory in the first place?

All things being equal, I think DDR5 is slower in bandwidth then GDDR7, so tasks leveraging the Cuda processing you'll better off relying on vram. DDR 5 has a bandwidth around 4.8 to 8.8 Gbps where as GDDR7 has a bandwidth in the neighborhood of 32–36 Gbbps,
I need unified memory, I consistently use system memory too. I know LLM use GPU exclusively but in case of large datasets, it’s not uncommon to use system memory for i/o and other compute.
 
I need unified memory, I consistently use system memory too. I know LLM use GPU exclusively but in case of large datasets, it’s not uncommon to use system memory for i/o and other compute.
Ok, so help me connect the dots, I don't use LLMs, so I'm speaking from a position of ignorance.

For the sake of argument, lets say the thread ripper uses system memory for both GPU computations and cpu computations, its running at DDR5 speeds. Wouldn't that be slower then a threadripper using GDDR7 for LLM processing and the system ram for non-LLM tasks (I'm assuming the GPU is a 5090)?

Apple silicon has memory on the Soc, so its an apple/oranges comparison. The memory bandwidth on the M3 Ultra is 819 Gbps, and up to 546 Gbs for the M4 Max. This is where Apple's architecture shines, so if you want unified memory, or high memory bandwidth Apple Silicon is really your only option.

One of the biggest gripes of intel processors with iGPUs for years was the fact it relied on system memory that was significantly slower then vram. What you're proposing is a return to that slow inefficient approach

Another option is to find a desktop with a Ryzen AI Max+ 395 While it uses unified memory, the max amount for the gpu cores is 96GB, its a close approximation of how apple has its chips
 
Given that the threadripper is not an apu so why would you want to use system memory in the first place?

All things being equal, I think DDR5 is slower in bandwidth then GDDR7, so tasks leveraging the Cuda processing you'll better off relying on vram. DDR 5 has a bandwidth around 4.8 to 8.8 Gbps where as GDDR7 has a bandwidth in the neighborhood of 32–36 Gbbps,

Does make you think though - what if AMD put a few GPU dies on thread ripper, or even better, EPYC. While DDR5 is slower than GDDR7, its probably a lot closer if you have 8-16 memory channels of it
 
Another option is to find a desktop with a Ryzen AI Max+ 395 While it uses unified memory, the max amount for the gpu cores is 96GB, its a close approximation of how apple has its chips
I may be wrong, but Ryzen AI Max+ 395 supports up to 4 LPDDR5X channels, which totals 128-bits of memory bus. This is the same as a base M4. Pro doubles the base SoC bandwidth and Max doubles the Pro’s bandwidth, and if I’m not wrong both M4 and Ryzen uses LPDDR5X tho. Ryzen may have support for the highest memory clocks.

Ryzen’s GPU clock is close to 3GHz so it has an edge over M4/Pro/Max GPU cores, but not sure if they will be bandwidth starved tho.
 
For the sake of argument, lets say the thread ripper uses system memory for both GPU computations and cpu computations, its running at DDR5 speeds. Wouldn't that be slower then a threadripper using GDDR7 for LLM processing and the system ram for non-LLM tasks (I'm assuming the GPU is a 5090)?

DDR5 is slower than GDDR7 per pin, but one also needs to look at the interface width and the memory clock. For example, higher-end threadripper platforms feature 8 DDR5 channels (512 bits in total), which is around 400GB/s.

In general, yes, GDDR used in gaming GPUs will have a higher bandwidth because it runs at very high frequency. The power consumption is something else too. For Spark, Nvidia uses a 256-bit wide memory interface running at high clock (I suppose they can ship it since it's a low volume product).

Apple silicon has memory on the Soc, so its an apple/oranges comparison. The memory bandwidth on the M3 Ultra is 819 Gbps, and up to 546 Gbs for the M4 Max. This is where Apple's architecture shines, so if you want unified memory, or high memory bandwidth Apple Silicon is really your only option.

RAM is not on SoC, but it's close to SoC. Design-wise, it's a similar solution to what GPUs use, only Apple uses mobile RAM standard instead of GDDR. Apple's RAM is relatively fast because they use very wide (expensive!) RAM interfaces and high-performance RAM modules. But it has nothing to do with using an SoC. That's the same fundamental tech that every laptop was using for over a decade (albeit fine-tuned and optimized to the extreme).

One of the biggest gripes of intel processors with iGPUs for years was the fact it relied on system memory that was significantly slower then vram. What you're proposing is a return to that slow inefficient approach

System RAM does not have to be slow. Intel iGPUs were slow because they optimized for cost, not performance. Wide memory interfaces are very expensive. For example, Nvidia has been cutting down the memory bus for many years now — only the ultra-high-end desktop 5090 RTX ships with a 512 bit interface. Apple is the only current vendor willing to throw $$$ at the problem, and they can afford it because of their high product price.
 
In your use case memory capacity is more important then bandwidth or GPU speed? It looks like all the PC SoCs are limited to 128GB, which doesn't seem like enough.
I can go higher with AMD EPYC. I am ok with 128 GB, just need unified memory. Hopefully
Ok, so help me connect the dots, I don't use LLMs, so I'm speaking from a position of ignorance.

For the sake of argument, lets say the thread ripper uses system memory for both GPU computations and cpu computations, its running at DDR5 speeds. Wouldn't that be slower then a threadripper using GDDR7 for LLM processing and the system ram for non-LLM tasks (I'm assuming the GPU is a 5090)?

Apple silicon has memory on the Soc, so its an apple/oranges comparison. The memory bandwidth on the M3 Ultra is 819 Gbps, and up to 546 Gbs for the M4 Max. This is where Apple's architecture shines, so if you want unified memory, or high memory bandwidth Apple Silicon is really your only option.

One of the biggest gripes of intel processors with iGPUs for years was the fact it relied on system memory that was significantly slower then vram. What you're proposing is a return to that slow inefficient approach

Another option is to find a desktop with a Ryzen AI Max+ 395 While it uses unified memory, the max amount for the gpu cores is 96GB, its a close approximation of how apple has its chips
i didn’t propose anything. Not sure where you got that idea from my post. VRAM is faster but in a workload, the GPU may be only 40-50%. Your I/O, CPU will have a part. I do wish pipelines are 100% GPU and everything fits in VRAM. All the benchmarks, specs are fun and games till you start running pipelines, and you realize it’s a part of the puzzle. Unified memory shines there but Apple needs to beef up the GPU.
 
Apple needs to move to a chiplet design for their high-end desktop/workstation models...

  • MX Extreme (2nm)
  • 64 CPU cores (48P/16E)
  • 1,024 GPU cores
  • 256 Neural Engine cores
  • 1.92TB LPDDR6 ECC RAM
  • 2.16TB/s UMA bandwidth
 
Apple needs to move to a chiplet design for their high-end desktop/workstation models...

  • MX Extreme (2nm)
  • 64 CPU cores (48P/16E)
  • 1,024 GPU cores
  • 256 Neural Engine cores
  • 1.92TB LPDDR6 ECC RAM
  • 2.16TB/s UMA bandwidth
Why only 1024 gpu cores? Only playing mobile games?
 
I am curious too. I am planning to replace my workstation next year.

I started with the premise (last year) that I would soon enjoy an understanding of just what AI inference, learning and modeling was all-about.

Of course, I purchased some ***** from Teh Bay, and was burned (multiple times).

In the Process, I secured some semi-serious hardware. My latest hope is that I can secure a decent GPU that facilitates the intended learning, and doesn't help me regret my latest purchases...

IDRGAS whether the DGX Spark Duo I reserved will hold a crown.

What I most desire is a platform where I can experience, learn and understand.

Is this too much to ask?
 
My EPYC 4565P has GPU dies (or something akin to such).

I don't need a GPU to drive a display from my H13SAE-MF ... 🤷‍♂️
Ah I meant a significant quantity of them. I think the GPU cores on EPYC and Ryzen are in the IO die at the moment? on Threadripper/EPYC they could maybe replace some of the many CPU dies with GPU dies and build something with a heap of unified memory for AI.
 
  • Like
Reactions: splifingate
It seems that a lot of folks in this thread are purchasing a GB10 workstation. Do we have so many ML professionals working with large models here?
 
Do we have so many ML professionals working with large models here?
That's what I'm wondering.

I was (am) tempted by the GB10, but truth be told, I really don't have a use case for it. It would largely be sitting there after the initial glow of buying new wore off.
 
That's what I'm wondering.

I was (am) tempted by the GB10, but truth be told, I really don't have a use case for it. It would largely be sitting there after the initial glow of buying new wore off.

I see more of them in the r/macstudio than here.

Some people are playing around with them, I guess to enhance job marketability; some do it for work and some for research.
 
That's what I'm wondering.

I was (am) tempted by the GB10, but truth be told, I really don't have a use case for it. It would largely be sitting there after the initial glow of buying new wore off.

That's quite an expensive toy. I don't expect it to be of much use as a general-purpose computer.
 
That's quite an expensive toy. I don't expect it to be of much use as a general-purpose computer.
Agreed, not something I'm willing to spend, but it does make those mentioning their plans to get one, all the more curious.
 
I'd like a faster GPU as well, but let's be realistic in your comparisons. One is a CPU with an integrated GPU, the other is just a GPU. Of course a dedicated GPU is going to outperform the CPU. If you're going try and do a fair comparison, then compare apples to apples. Compare the M4 to any AMD or Intel CPU with integrated graphics.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.