Are there any real world examples of applications taking advantage of 128GB+ of VRAM?

senttoschool · Sep 26, 2022

Today, the M1 Ultra can have up to 128GB of video memory. This is unheard of and only possible because of unified memory architecture.

If there is an M2 "Extreme", then it could possibly have up to 384GB of video RAM (96GB * 4). This is 304GB more than Nvidia's upcoming H100 enterprise GPU.

Given this incredible advantage, what sorts of applications can take advantage of this? Are there any professionals who are using all 128GB of the Studio today? Are there applications that are bottlenecked by the amount of video memory found in discrete GPUs?

Spindel · Sep 26, 2022

I mean the UMA would be gold if there was any good CFD software for mac that use both CPU and GPU.

Xiao_Xi · Sep 26, 2022

3D rendering. I vaguely remember reading in the 3D rendering applications thread that the M1 Ultra is faster than the Nvidia RTX 3090 in rendering the Moana Island scene because the scene is so big that it doesn't fit on the Nvidia GPU.

Moana Island Scene

This data set contains everything necessary to render a version of the Motunui island featured in the 2016 film Moana.

www.disneyanimation.com

I'm sure someone else can fill in with more details and even find the benchmark results.

l0stl0rd · Sep 26, 2022

Xiao_Xi said:
3D rendering. I vaguely remember reading in the 3D rendering applications thread that the M1 Ultra is faster than the Nvidia RTX 3090 in rendering the Moana Island scene because the scene is so big that it doesn't fit on the Nvidia GPU.

Moana Island Scene

This data set contains everything necessary to render a version of the Motunui island featured in the 2016 film Moana.

www.disneyanimation.com

I'm sure someone else can fill in with more details and even find the benchmark results.

It was in the redshift forum but as far as I remember he also said it looked odd.

I am not sure if he ever got it to render correct and what the time was.

Mr Screech · Sep 26, 2022

Potentially AI generating images.
12gb vram maxes out at about 1200x1200.
M1 version of 'stable diffusion' uses about 25-30gb of ram for 1000x1000, so not sure how that works.
Depends probably on model and inputs used, because I've seen it go to 60gb+.

Most people will use it as 'normal' RAM.
Visual effects industry eats it up like it's nothing.
Either for caching timeline playback, or rendering.

Zest28 · Sep 26, 2022

For amateur personal usage, I have been able to hit around 22GB of VRAM on my 16" M1 Max MBP.

So I'm pretty sure that someone can easily hit 128gb+ VRAM if they do some serious work with these machines.

l0stl0rd · Sep 26, 2022

Zest28 said:
For amateur personal usage, I have been able to hit around 22GB of VRAM on my 16" M1 Max MBP.

So I'm pretty sure that someone can easily hit 128gb+ VRAM if they do some serious work with these machines.

Probably it also depends on your scene optimization. I had one with 18 GB made it fit in the 12 GB by reducing textures and some geometry. Nobody will notice if you reduce one in the background from 8 or 4 k to 1k or even 512 in some cases.

I will say a lot of VRAM will make is easier as you can be less careful and get away with badly optimized scenes.

mi7chy · Sep 26, 2022

Mr Screech said:
Potentially AI generating images.
12gb vram maxes out at about 1200x1200.
M1 version of 'stable diffusion' uses about 25-30gb of ram for 1000x1000, so not sure how that works.
Depends probably on model and inputs used, because I've seen it go to 60gb+.

Have you actually tried > 512x512 on Apple Silicon? People on 128GB M1 Ultra and 64GB M1 Max are reporting hitting a limit of 512x512.

https://news.ycombinator.com/item?id=32678664#32681077

Mr Screech · Sep 26, 2022

mi7chy said:
Have you actually tried > 512x512 on Apple Silicon? People on 128GB M1 Ultra and 64GB M1 Max are reporting hitting a limit of 512x512.

https://news.ycombinator.com/item?id=32678664#32681077

At 1024x1024 I get noise, but it works with 960x960.
Using about 38gb of RAM on Ultra 128GB.

This was the output when using 'apple tree'

theorist9 · Sep 26, 2022

senttoschool said:
Today, the M1 Ultra can have up to 128GB of video memory. This is unheard of and only possible because of unified memory architecture.

If there is an M2 "Extreme", then it could possibly have up to 384GB of video RAM (96GB * 4). This is 304GB more than Nvidia's upcoming H100 enterprise GPU.

Given this incredible advantage, what sorts of applications can take advantage of this? Are there any professionals who are using all 128GB of the Studio today? Are there applications that are bottlenecked by the amount of video memory found in discrete GPUs?

That 128 GB is shared between the CPU and GPU. I don't know the details of how UMA works, but wouldn't that mean the effective amount available to the GPU could be significantly less than than that? [Well, with UMA, I suppose technically all of it is available to the GPU, so perhaps I should have said "available for GPU tasks".]

The NVIDIA RTX A6000 has 48 GB VRAM, so dual A6000's would have 96 GB, but I've learned that dual workstation GPU's can't share their VRAM. So it wouldnt be a single large pool of VRAM like Apple offers.

OTOH, the A100 is available with up to 80 GB VRAM and, according to this, Google Cloud offers VM instances that combine eight A100's for a total of 640 GB VRAM. But I don't know if this can behave as a single, shared pool of VRAM, or if it's instead "just" 8 x 80 GB individual buckets of VRAM.

GPU machine types | Compute Engine Documentation | Google Cloud

Understand instance options available to support GPU-accelerated workloads such as machine learning, data processing, and graphics workloads on Compute Engine.

cloud.google.com

[Yes, they say multiple 80 GB A100's is not fully supported, but you can also get 640 GB VRAM with 16 x 40 GB A100's, and that does appear to be fully supported.]

quarkysg · Sep 26, 2022

theorist9 said:
The NVIDIA RTX A6000 has 48 GB VRAM, so a workstation with dual A6000's would have 96 GB. Hence VRAM in the ~100 GB range isn't "unheard of".

If the total assets required to render the scene is more than 48GB, it wouldn't be able to render it right as the card's VRAM could not hold everything in to work?

theorist9 · Sep 26, 2022

quarkysg said:
If the total assets required to render the scene is more than 48GB, it wouldn't be able to render it right as the card's VRAM could not hold everything in to work?

Yeah, you're right. I was assuming dual GPU's could share their VRAM, but that's not the case. I'll edit my post. Thanks for the correction.

Xiao_Xi · Sep 26, 2022

quarkysg said:
If the total assets required to render the scene is more than 48GB, it wouldn't be able to render it right as the card's VRAM could not hold everything in to work?

Some renderers can use out-of-core memory.

Redshift has the capability of "out of core" rendering which means that if a GPU runs out of memory, it will use the system's memory instead.

https://support.maxon.net/hc/en-us/articles/1500006456741-What-is-out-of-core-rendering-

It will take longer, but it will eventually render the scene.

iPadified · Sep 26, 2022

All kinds of scientific simulation and modelling.

quarkysg · Sep 26, 2022

Xiao_Xi said:
Some renderers can use out-of-core memory.

Technically possible but the PCIe bottleneck will kill any dGPU speed advantage I would think. Probably still better than doing it via the CPU.

theorist9 · Sep 26, 2022

quarkysg said:
Technically possible but the PCIe bottleneck will kill any dGPU speed advantage I would think. Probably still better doing it via the CPU.

If you'd like to take a look at the edit I just added to my post about Google Cloud's virtual machine instances, I'd be interested to hear your thoughts.

Xiao_Xi · Sep 26, 2022

theorist9 said:
the A100 is available with up to 80 GB VRAM and, according to this, Google Cloud offers VM instances that combine eight A100's for a total of 640 GB VRAM. But I don't know if this can behave as a single, shared pool of VRAM, or if it's instead "just" 8 x 80 GB individual buckets of VRAM.

A moderator on the nVidia forums stated in 2019:

NVLink provides a fast interconnect between GPUs, but does not aggregate those GPUs into a single logical device. That said, DL training can usually be efficiently spread across multiple GPUs by increasing the minibatch size and distributing different sets of images to each GPU. Horovod is a third-part tool provided in our containers that simplifies the task of parallelizing over multiple GPUs (or even multiple hosts). Alternatively you can use TF’s Distribution Strategies approach.

Can NVLink combine 2x GPUs into 1x Big GPU?

Hi,m.rokosz Thanks for your comment, I found it out. I contacted to Boxx (NVIDIA GPU Workstation Vendor) and they told me that 2x RTX 2080Ti could not be integrated into 1x GPU by NVLink. NVlink only supports fast GPU connection between GPUs but not integration. Well, they told me about it but...

forums.developer.nvidia.com

quarkysg · Sep 26, 2022

theorist9 said:
OTOH, the A100 is available with up to 80 GB VRAM and, according to this, Google Cloud offers VM instances that combine eight A100's for a total of 640 GB VRAM. But I don't know if this can behave as a single, shared pool of VRAM, or if it's instead "just" 8 x 80 GB individual buckets of VRAM.

GPU machine types | Compute Engine Documentation | Google Cloud

Understand instance options available to support GPU-accelerated workloads such as machine learning, data processing, and graphics workloads on Compute Engine.

cloud.google.com

[Yes, they say multiple 80 GB A100's is not fully supported, but you can also get 640 GB VRAM with 16 x 40 GB A100's, and that does appear to be fully supported.]

I have to admit I'm not familiar with how Google Cloud works when it comes to hosted GPU, but they are basically PCIe cards inserted into rack-mount servers I would imagine, so it will not be any different from a PC with PCIe slots. The constraints then will be the same.

Each dGPU card will have a finite amount of VRAM (which is the nature of dGPUs), which will limit the dataset that can be processes at any one time. If the data set to be processed is larger than VRAM, swapping will have to occur, provided the workload allows such swappings and that it make sense to swap.

I suppose multi A100 setup in Google Cloud would benefit AI training.

Edit: I see that you have also edited your post to state that the multi A100 setup is good for ML.

quarkysg · Sep 26, 2022

If the upcoming AS Mac Pro can rival the Intel Mac Pro in terms of memory capacity with UMA, it will most likely disrupt the status quo of GPU compute workloads. Should open up a lot more areas where the current dGPU VRAMs are the limiting factors.

singhs.apps · Sep 26, 2022

theorist9 said:
I was assuming dual GPU's could share their VRAM, but that's not the case. I'll edit my post.

They can. Nvidia uses nvlink to pool the GPU memory of two cards on the desktops. 96 GB total vram is possible, but you won’t get the full memory. Closer to 90 GB possibly with the rest reserved for other tasks.
With 2xxx series, Nvidia offered nvlink support but two 11 GB 2080tis could only pool upto 20 GB memory between them, not 22 GB.
Unclear if the same ratio applies with cards with more memory.

quarkysg said:
If the total assets required to render the scene is more than 48GB, it wouldn't be able to render it right as the card's VRAM could not hold everything in to work?

Only the Geo. Textures can be streamed off disks with some caveats.
Edit :
Octane can fit around 9 million tris on 1GB vram..About ~425 million on a 48 GB card, but you can fit orders of magnitude more if your scene contains render instances.
Redshift can fit even more. Around 30 million per GB of vram.

theorist9 said:
That 128 GB is shared between the CPU and GPU. I don't know the details of how UMA works, but wouldn't that mean the effective amount available to the GPU could be significantly less than than that? [Well, with UMA, I suppose technically all of it is available to the GPU, so perhaps I should have said "available for GPU tasks".]

The redshift team mentioned that around 42 GB out of the 64 GB M1 Max was available for their renderer. Again not sure if the ratio holds at higher ram. Even then around 168 GB would be available for a 256 GB ‘Mac pro’.

l0stl0rd said:
Probably it also depends on your scene optimization. I had one with 18 GB made it fit in the 12 GB by reducing textures and some geometry. Nobody will notice if you reduce one in the background from 8 or 4 k to 1k or even 512 in some cases.

I will say a lot of VRAM will make is easier as you can be less careful and get away with badly optimized scenes.

Just yesterday, I tried a shift from 128x128 to 256 x 256 tile render (on an 18 core cpu). The memory allocated jumped from around 10-12 GB for a less than 200k tris scene with around 12 8K textures to 90+ GB. There are ways to fill up available resources

Xiao_Xi · Sep 26, 2022

quarkysg said:
I have to admit I'm not familiar with how Google Cloud works when it comes to hosted GPU, but they are basically PCIe cards inserted into rack-mount servers I would imagine, so it will not be any different from a PC with PCIe slots. The constraints then will be the same.

Could they use NVIDIA DGX?

NVIDIA DGX H200

The World’s Proven Choice for Enterprise AI.

www.nvidia.com

NVIDIA DGX Platform

The best of NVIDIA AI - All in One Place.

www.nvidia.com

theorist9 · Sep 26, 2022

singhs.apps said:
They can. Nvidia uses nvlink to pool the GPU memory of two cards on the desktops. 96 GB total vram is possible, but you won’t get the full memory. Closer to 90 GB possibly with the rest reserved for other tasks.
With 2xxx series, Nvidia offered nvlink support but two 11 GB 2080tis could only pool upto 20 GB memory between them, not 22 GB.
Unclear if the same ratio applies with cards with more memory.

It would be cool if they could, but what about this?:

Xiao_Xi said:
A moderator on the nVidia forums stated in 2019:

"NVLink provides a fast interconnect between GPUs, but does not aggregate those GPUs into a single logical device...."

Can NVLink combine 2x GPUs into 1x Big GPU?

Hi,m.rokosz Thanks for your comment, I found it out. I contacted to Boxx (NVIDIA GPU Workstation Vendor) and they told me that 2x RTX 2080Ti could not be integrated into 1x GPU by NVLink. NVlink only supports fast GPU connection between GPUs but not integration. Well, they told me about it but...

forums.developer.nvidia.com

Plus Maxxon (the maker of Redshift) says:

"Redshift does not combine the VRAM when using multiple GPUs. This is a limitation of current GPU technology and not related to Redshift in particular." [emphasis mine.]

https://support.maxon.net/hc/en-us/articles/1500006456701-When-Redshift-uses-multiple-GPUs-is-their-memory-combined-

senttoschool · Sep 26, 2022

singhs.apps said:
The redshift team mentioned that around 42 GB out of the 64 GB M1 Max was available for their renderer. Again not sure if the ratio holds at higher ram. Even then around 168 GB would be available for a 256 GB ‘Mac pro’.

Why would this ratio be linear? I assume that macOS takes a static amount of the memory but macOS does not take more memory if you have more.

quarkysg · Sep 26, 2022

singhs.apps said:
Textures can be streamed off disks with some caveats.

SSDs are around 8GB/s. Raided SSD will be faster but that will be bottlenecked by PCIe's 32/64 GB/s bandwidth. This is an order of magnitude slower than the dGPU's VRAM bandwidth. Streaming from disk to VRAM, IMHO, is a technology geared towards gaming, with lower resolution assets.

For scenes that need massive assets repeatedly, it's probably going to be very slow. Probably better to render the scene using a AMD Threadripper.

theorist9 · Sep 26, 2022

Xiao_Xi said:
Could they use NVIDIA DGX?

NVIDIA DGX H200

The World’s Proven Choice for Enterprise AI.

www.nvidia.com

NVIDIA DGX Platform

The best of NVIDIA AI - All in One Place.

www.nvidia.com

I was wondering that myself, since I came across this earlier:

However, the GPUs in the DGX are connected by NVLink, so I thought that was answered (in the negative) by your quote from the NVIDIA developer, who said "NVLink provides a fast interconnect between GPUs, but does not aggregate those GPUs into a single logical device."

Are there any real world examples of applications taking advantage of 128GB+ of VRAM?

macrumors 68030

macrumors 6502a

macrumors 68000

macrumors 6502

macrumors 6502

macrumors 68030

macrumors 6502

Suspended

macrumors 6502

macrumors 601

macrumors 65816

macrumors 601

macrumors 68000

macrumors 68020

macrumors 65816

macrumors 601

macrumors 68000

macrumors 65816

macrumors 65816

macrumors 6502a

macrumors 68000

macrumors 601

macrumors 68030

macrumors 65816

macrumors 601

Our Staff