Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,674
5,659
Today, the M1 Ultra can have up to 128GB of video memory. This is unheard of and only possible because of unified memory architecture.

If there is an M2 "Extreme", then it could possibly have up to 384GB of video RAM (96GB * 4). This is 304GB more than Nvidia's upcoming H100 enterprise GPU.

Given this incredible advantage, what sorts of applications can take advantage of this? Are there any professionals who are using all 128GB of the Studio today? Are there applications that are bottlenecked by the amount of video memory found in discrete GPUs?
 
I mean the UMA would be gold if there was any good CFD software for mac that use both CPU and GPU.
 
3D rendering. I vaguely remember reading in the 3D rendering applications thread that the M1 Ultra is faster than the Nvidia RTX 3090 in rendering the Moana Island scene because the scene is so big that it doesn't fit on the Nvidia GPU.

I'm sure someone else can fill in with more details and even find the benchmark results.
 
3D rendering. I vaguely remember reading in the 3D rendering applications thread that the M1 Ultra is faster than the Nvidia RTX 3090 in rendering the Moana Island scene because the scene is so big that it doesn't fit on the Nvidia GPU.

I'm sure someone else can fill in with more details and even find the benchmark results.
It was in the redshift forum but as far as I remember he also said it looked odd.

I am not sure if he ever got it to render correct and what the time was.
 
  • Like
Reactions: Xiao_Xi
Potentially AI generating images.
12gb vram maxes out at about 1200x1200.
M1 version of 'stable diffusion' uses about 25-30gb of ram for 1000x1000, so not sure how that works.
Depends probably on model and inputs used, because I've seen it go to 60gb+.

Most people will use it as 'normal' RAM.
Visual effects industry eats it up like it's nothing.
Either for caching timeline playback, or rendering.
 
  • Like
Reactions: singhs.apps
For amateur personal usage, I have been able to hit around 22GB of VRAM on my 16" M1 Max MBP.

So I'm pretty sure that someone can easily hit 128gb+ VRAM if they do some serious work with these machines.
 
For amateur personal usage, I have been able to hit around 22GB of VRAM on my 16" M1 Max MBP.

So I'm pretty sure that someone can easily hit 128gb+ VRAM if they do some serious work with these machines.
Probably it also depends on your scene optimization. I had one with 18 GB made it fit in the 12 GB by reducing textures and some geometry. Nobody will notice if you reduce one in the background from 8 or 4 k to 1k or even 512 in some cases.

I will say a lot of VRAM will make is easier as you can be less careful and get away with badly optimized scenes.
 
Potentially AI generating images.
12gb vram maxes out at about 1200x1200.
M1 version of 'stable diffusion' uses about 25-30gb of ram for 1000x1000, so not sure how that works.
Depends probably on model and inputs used, because I've seen it go to 60gb+.

Have you actually tried > 512x512 on Apple Silicon? People on 128GB M1 Ultra and 64GB M1 Max are reporting hitting a limit of 512x512.

https://news.ycombinator.com/item?id=32678664#32681077
 
Have you actually tried > 512x512 on Apple Silicon? People on 128GB M1 Ultra and 64GB M1 Max are reporting hitting a limit of 512x512.

https://news.ycombinator.com/item?id=32678664#32681077
At 1024x1024 I get noise, but it works with 960x960.
Using about 38gb of RAM on Ultra 128GB.

This was the output when using 'apple tree'
000373.765601371.png
 
Today, the M1 Ultra can have up to 128GB of video memory. This is unheard of and only possible because of unified memory architecture.

If there is an M2 "Extreme", then it could possibly have up to 384GB of video RAM (96GB * 4). This is 304GB more than Nvidia's upcoming H100 enterprise GPU.

Given this incredible advantage, what sorts of applications can take advantage of this? Are there any professionals who are using all 128GB of the Studio today? Are there applications that are bottlenecked by the amount of video memory found in discrete GPUs?
That 128 GB is shared between the CPU and GPU. I don't know the details of how UMA works, but wouldn't that mean the effective amount available to the GPU could be significantly less than than that? [Well, with UMA, I suppose technically all of it is available to the GPU, so perhaps I should have said "available for GPU tasks".]

The NVIDIA RTX A6000 has 48 GB VRAM, so dual A6000's would have 96 GB, but I've learned that dual workstation GPU's can't share their VRAM. So it wouldnt be a single large pool of VRAM like Apple offers.

OTOH, the A100 is available with up to 80 GB VRAM and, according to this, Google Cloud offers VM instances that combine eight A100's for a total of 640 GB VRAM. But I don't know if this can behave as a single, shared pool of VRAM, or if it's instead "just" 8 x 80 GB individual buckets of VRAM.


[Yes, they say multiple 80 GB A100's is not fully supported, but you can also get 640 GB VRAM with 16 x 40 GB A100's, and that does appear to be fully supported.]
 
Last edited:
The NVIDIA RTX A6000 has 48 GB VRAM, so a workstation with dual A6000's would have 96 GB. Hence VRAM in the ~100 GB range isn't "unheard of".
If the total assets required to render the scene is more than 48GB, it wouldn't be able to render it right as the card's VRAM could not hold everything in to work?
 
  • Like
Reactions: theorist9
If the total assets required to render the scene is more than 48GB, it wouldn't be able to render it right as the card's VRAM could not hold everything in to work?
Yeah, you're right. I was assuming dual GPU's could share their VRAM, but that's not the case. I'll edit my post. Thanks for the correction.
 
  • Like
Reactions: quarkysg
If the total assets required to render the scene is more than 48GB, it wouldn't be able to render it right as the card's VRAM could not hold everything in to work?
Some renderers can use out-of-core memory.
Redshift has the capability of "out of core" rendering which means that if a GPU runs out of memory, it will use the system's memory instead.

It will take longer, but it will eventually render the scene.
 
Some renderers can use out-of-core memory.
Technically possible but the PCIe bottleneck will kill any dGPU speed advantage I would think. Probably still better than doing it via the CPU.
 
Last edited:
Technically possible but the PCIe bottleneck will kill any dGPU speed advantage I would think. Probably still better doing it via the CPU.
If you'd like to take a look at the edit I just added to my post about Google Cloud's virtual machine instances, I'd be interested to hear your thoughts.
 
Last edited:
the A100 is available with up to 80 GB VRAM and, according to this, Google Cloud offers VM instances that combine eight A100's for a total of 640 GB VRAM. But I don't know if this can behave as a single, shared pool of VRAM, or if it's instead "just" 8 x 80 GB individual buckets of VRAM.

A moderator on the nVidia forums stated in 2019:
NVLink provides a fast interconnect between GPUs, but does not aggregate those GPUs into a single logical device. That said, DL training can usually be efficiently spread across multiple GPUs by increasing the minibatch size and distributing different sets of images to each GPU. Horovod is a third-part tool provided in our containers that simplifies the task of parallelizing over multiple GPUs (or even multiple hosts). Alternatively you can use TF’s Distribution Strategies approach.
 
  • Like
Reactions: theorist9
OTOH, the A100 is available with up to 80 GB VRAM and, according to this, Google Cloud offers VM instances that combine eight A100's for a total of 640 GB VRAM. But I don't know if this can behave as a single, shared pool of VRAM, or if it's instead "just" 8 x 80 GB individual buckets of VRAM.

[Yes, they say multiple 80 GB A100's is not fully supported, but you can also get 640 GB VRAM with 16 x 40 GB A100's, and that does appear to be fully supported.]
I have to admit I'm not familiar with how Google Cloud works when it comes to hosted GPU, but they are basically PCIe cards inserted into rack-mount servers I would imagine, so it will not be any different from a PC with PCIe slots. The constraints then will be the same.

Each dGPU card will have a finite amount of VRAM (which is the nature of dGPUs), which will limit the dataset that can be processes at any one time. If the data set to be processed is larger than VRAM, swapping will have to occur, provided the workload allows such swappings and that it make sense to swap.

I suppose multi A100 setup in Google Cloud would benefit AI training.

Edit: I see that you have also edited your post to state that the multi A100 setup is good for ML.
 
  • Like
Reactions: theorist9
If the upcoming AS Mac Pro can rival the Intel Mac Pro in terms of memory capacity with UMA, it will most likely disrupt the status quo of GPU compute workloads. Should open up a lot more areas where the current dGPU VRAMs are the limiting factors.
 
I was assuming dual GPU's could share their VRAM, but that's not the case. I'll edit my post.
They can. Nvidia uses nvlink to pool the GPU memory of two cards on the desktops. 96 GB total vram is possible, but you won’t get the full memory. Closer to 90 GB possibly with the rest reserved for other tasks.
With 2xxx series, Nvidia offered nvlink support but two 11 GB 2080tis could only pool upto 20 GB memory between them, not 22 GB.
Unclear if the same ratio applies with cards with more memory.
If the total assets required to render the scene is more than 48GB, it wouldn't be able to render it right as the card's VRAM could not hold everything in to work?
Only the Geo. Textures can be streamed off disks with some caveats.
Edit :
Octane can fit around 9 million tris on 1GB vram..About ~425 million on a 48 GB card, but you can fit orders of magnitude more if your scene contains render instances.
Redshift can fit even more. Around 30 million per GB of vram.
That 128 GB is shared between the CPU and GPU. I don't know the details of how UMA works, but wouldn't that mean the effective amount available to the GPU could be significantly less than than that? [Well, with UMA, I suppose technically all of it is available to the GPU, so perhaps I should have said "available for GPU tasks".]
The redshift team mentioned that around 42 GB out of the 64 GB M1 Max was available for their renderer. Again not sure if the ratio holds at higher ram. Even then around 168 GB would be available for a 256 GB ‘Mac pro’.
Probably it also depends on your scene optimization. I had one with 18 GB made it fit in the 12 GB by reducing textures and some geometry. Nobody will notice if you reduce one in the background from 8 or 4 k to 1k or even 512 in some cases.

I will say a lot of VRAM will make is easier as you can be less careful and get away with badly optimized scenes.
Just yesterday, I tried a shift from 128x128 to 256 x 256 tile render (on an 18 core cpu). The memory allocated jumped from around 10-12 GB for a less than 200k tris scene with around 12 8K textures to 90+ GB. There are ways to fill up available resources:)
 
Last edited:
  • Like
Reactions: Xiao_Xi
I have to admit I'm not familiar with how Google Cloud works when it comes to hosted GPU, but they are basically PCIe cards inserted into rack-mount servers I would imagine, so it will not be any different from a PC with PCIe slots. The constraints then will be the same.
Could they use NVIDIA DGX?
 
They can. Nvidia uses nvlink to pool the GPU memory of two cards on the desktops. 96 GB total vram is possible, but you won’t get the full memory. Closer to 90 GB possibly with the rest reserved for other tasks.
With 2xxx series, Nvidia offered nvlink support but two 11 GB 2080tis could only pool upto 20 GB memory between them, not 22 GB.
Unclear if the same ratio applies with cards with more memory.
It would be cool if they could, but what about this?:
A moderator on the nVidia forums stated in 2019:

"NVLink provides a fast interconnect between GPUs, but does not aggregate those GPUs into a single logical device...."

Plus Maxxon (the maker of Redshift) says:

"Redshift does not combine the VRAM when using multiple GPUs. This is a limitation of current GPU technology and not related to Redshift in particular." [emphasis mine.]

 
The redshift team mentioned that around 42 GB out of the 64 GB M1 Max was available for their renderer. Again not sure if the ratio holds at higher ram. Even then around 168 GB would be available for a 256 GB ‘Mac pro’.
Why would this ratio be linear? I assume that macOS takes a static amount of the memory but macOS does not take more memory if you have more.
 
Textures can be streamed off disks with some caveats.
SSDs are around 8GB/s. Raided SSD will be faster but that will be bottlenecked by PCIe's 32/64 GB/s bandwidth. This is an order of magnitude slower than the dGPU's VRAM bandwidth. Streaming from disk to VRAM, IMHO, is a technology geared towards gaming, with lower resolution assets.

For scenes that need massive assets repeatedly, it's probably going to be very slow. Probably better to render the scene using a AMD Threadripper.
 
Could they use NVIDIA DGX?
I was wondering that myself, since I came across this earlier:

1664261172116.png


However, the GPUs in the DGX are connected by NVLink, so I thought that was answered (in the negative) by your quote from the NVIDIA developer, who said "NVLink provides a fast interconnect between GPUs, but does not aggregate those GPUs into a single logical device."
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.