3D Rendering on Apple Silicon, CPU&GPU

name99 · Dec 14, 2023

leman said:
Those are two very different things. Virtual address is what your application sees, the hardware uses a sophisticated mechanism to translate it to the actual physical address. This enables a bunch of useful features (e.g. the address might point to a file on disk instead of physical RAM, you get additional safety, and the OS can move/compress the physical memory without you even noticing, etc. etc.).

For unified virtual memory you need to use the same address space on the CPU and the GPU, so that if you copy a memory address between devices it will be valid and point to the same data. I do not know why Apple does not support this, one would think that their hardware would be perfectly capable of sharing memory page descriptors. Maybe there is still some legacy reason, or maybe there are additional complications. But who knows, it is also possible that Metal 4 will have unified virtual memory on Apple Silicon but will drop Intel-based Macs or something like that.

What notes are you referring to beyond what I see on the screen which does not say this.

The issue, as far as I can tell is not one of physical or virtual access, it is one of API.
I am NOT a Metal expert, but this is my understanding:

Metal wants you to use Handles (essentially indices into a table of address ranges) not raw pointers. This is not because raw pointers won't, in some sense, "work", it's because the Metal API depends on being told how and when blocks of data are being used.
This is because the L1 caches are not coherent, and *SW conventions* are required to handle coherency.

After every unit of work (called a "kick", but you can think of this as a shader) the L1 caches are flushed to L2, so that subsequent shaders that depend on the results of this shader, but which may run on a different core, can see the work that was done.
This mechanism for flushing data from L1 to L2 is very sophisticated in terms of flushing the minimum amount of data, and in terms of scheduling non-dependent kicks to execute at the same time that flushing is happening, BUT it depends on knowing which address ranges are used in what way by each kick – ie which address ranges were read, which were written.
If you start passing around raw pointers without informing Metal of how the associated data ranges are being used, you will lose this flushing/coherence.

This is not an issue for Vulkan, or more precisely it's a DIFFERENT issue, because Vulkan has a different model of who is responsible for flushing L1 caches. Apple's solution is not wrong, it's just different from Vulkan's, and assumes different packaging of the information about who is responsible for flushing what data ranges when.
It's like Apple is Java or Swift, with memory management happening behind the scenes – but you don't use raw pointers – while Vulkan is like C with manual calls to malloc and free.

Corrections welcome if I got anything wrong!

leman · Dec 14, 2023

name99 said:
What notes are you referring to beyond what I see on the screen which does not say this.

The issue, as far as I can tell is not one of physical or virtual access, it is one of API.
I am NOT a Metal expert, but this is my understanding:

Metal wants you to use Handles (essentially indices into a table of address ranges) not raw pointers. This is not because raw pointers won't, in some sense, "work", it's because the Metal API depends on being told how and when blocks of data are being used.
This is because the L1 caches are not coherent, and *SW conventions* are required to handle coherency.

After every unit of work (called a "kick", but you can think of this as a shader) the L1 caches are flushed to L2, so that subsequent shaders that depend on the results of this shader, but which may run on a different core, can see the work that was done.
This mechanism for flushing data from L1 to L2 is very sophisticated in terms of flushing the minimum amount of data, and in terms of scheduling non-dependent kicks to execute at the same time that flushing is happening, BUT it depends on knowing which address ranges are used in what way by each kick – ie which address ranges were read, which were written.
If you start passing around raw pointers without informing Metal of how the associated data ranges are being used, you will lose this flushing/coherence.

This is not an issue for Vulkan, or more precisely it's a DIFFERENT issue, because Vulkan has a different model of who is responsible for flushing L1 caches. Apple's solution is not wrong, it's just different from Vulkan's, and assumes different packaging of the information about who is responsible for flushing what data ranges when.
It's like Apple is Java or Swift, with memory management happening behind the scenes – but you don't use raw pointers – while Vulkan is like C with manual calls to malloc and free.

Corrections welcome if I got anything wrong!

What you describe may be one potential technical reason why Metal does not offer unified virtual memory, although I am not 100% convinced the cache flushing works exactly as you describe. Metal indeed requires you to tell it which memory objects are in use during a pass, but not the exact pointer range. There is nothing preventing you from allocating one huge Metal buffer and doing all your sub-allocations from there. In fact, that’s the preferred way since every buffer incurs additional overhead (and Metal gives you the heaps API to simplify these things). In summary, the system must already have a way to track which parts of the cache need flushing and so my guess is that the “use” APIs are there primarily to ensure that the resources are in memory and bound to the appropriate page tables. Vulkan doesn’t need the “use” API because all resources are explicitly assumed to be resident anyway, unlike Metal.

From the API consumer perspective the issue is that the CPU and the GPU use different virtual address spaces. And anyway, the buffer handle is just the GPU virtual address. It’s even called GPU address in the API and you can write valid GPU pointers from the CPU using these APIs. Real handles are used for textures and samplers and they indeed encode offsets into the per-process descriptor tables (same system Nvidia uses).

Rafterman · Dec 14, 2023

iPadified said:
$1000 upgrade to get a 128 Gb RAM GPU render setup is amazingly cheap. Try that with NVIDIA.

But that's all shared RAM - not really a fair comparison.

altaic · Dec 14, 2023

Rafterman said:
But that's all shared RAM - not really a fair comparison.

An 80 GB Nvidia card goes for something like $30k these days. So if a $1000 ASi upgrade gets you 128 GB, that leaves you with an extra 48 GB and $29k. Seems like a pretty good deal to me.

Rafterman · Dec 14, 2023

altaic said:
An 80 GB Nvidia card goes for something like $30k these days. So if a $1000 ASi upgrade gets you 128 GB, that leaves you with an extra 48 GB and $29k. Seems like a pretty good deal to me.

You're not paying for the RAM though, you are paying for the Tensor core CPU. That processor delivers 300 teraflops of GPU performance and is an industrial-class video card. The 40 core M3 Max GPU is around 16.

altaic · Dec 14, 2023

Rafterman said:
You're not paying for the RAM though, you are paying for the Tensor core CPU. That processor delivers 300 teraflops of GPU performance and is an industrial-class video card. The 40 core M3 Max GPU is around 16.

Sure, they’re not equivalent in compute, but nonetheless a lot of GPU-accessible RAM is more important than compute for some popular tasks (e.g. machine learning). We were talking about Nvidia’s RAM offerings, and if you need more than 24 GB then you’re going to pay $$$$$.

quarkysg · Dec 14, 2023

Rafterman said:
You're not paying for the RAM though, you are paying for the Tensor core CPU. That processor delivers 300 teraflops of GPU performance and is an industrial-class video card. The 40 core M3 Max GPU is around 16.

altaic said:
Sure, they’re not equivalent in compute, but nonetheless a lot of GPU-accessible RAM is more important than compute for some popular tasks (e.g. machine learning). We were talking about Nvidia’s RAM offerings, and if you need more than 24 GB then you’re going to pay $$$$$.

The way I see it, NV may still have the edge in compute for now, but IMHO, the wall of the PCIe bottleneck is approaching fast.

As high-capacity, high speed memory gets introduced, Apple's SoC design will likely get better with more memory.

It doesn't look like PCIe is getting any breakthrough anytime soon.

As it is, I remember reading a couple weeks back that an M2 Ultra with 128GB of memory is already faster in certain AI benchmark compared to a 4090, if the problem size is greater than the 4090's VRAM size, even tho the 4090 dwarfs the M2 Ultra in compute power. Can't find the link now tho. So it's a question of a balanced vs a skewed design architecture.

iPadified · Dec 14, 2023

Well, you made a dig at Apple $1000 RAM prices and when the fun did not pan out, you side step using other metrics. It is exactly the point that you cannot get 128 GB RAM without paying extensively for a NVIDIA card due to bundling with very capable compute engine.

Same criticism as the MX SoC uniformity where some think there is too many CPU cores compared to GPU and others think the opposite. If you need 128 Gb RAM for your 3D rendering or compute, you probably earn those $1000 is a few days.

iPadified · Dec 14, 2023

Rafterman said:
But that's all shared RAM - not really a fair comparison.

MacOS and apps does not take up that much space does it? Overhead of 2-3 GB RAM after a clean boot.

leman · Dec 14, 2023

Rafterman said:
But that's all shared RAM - not really a fair comparison.

I don’t get this. It’s RAM directly usable by the GPU. I’d rather have “shared“ RAM than GPU-only RAM simply because it lets me get data into the GPU quicker.

Rafterman said:
You're not paying for the RAM though, you are paying for the Tensor core CPU. That processor delivers 300 teraflops of GPU performance and is an industrial-class video card.

No, you are paying for RAM. The same tensor cores are also present on cheap gaming cards. The RAM on those industrial units is also considerably faster.

By the way, while the nominal performance of tensor cores is very impressive, it’s not nearly as high in practice. The figures you quote refer to ideal results under very specific testing conditions. In real world applications the real bottleneck is getting the data in and out the GPU. Even the fastest tensor cores be one a dead weight if your ram pool is not large and fast enough to stream in model weights for processing. This is also why professional units come with very fast RAM. And this is indeed why Apple Silicon won’t be able to compete with Nvidia in the server space, but does well in cost-efficient workstation space.

name99 · Dec 15, 2023

leman said:
What you describe may be one potential technical reason why Metal does not offer unified virtual memory, although I am not 100% convinced the cache flushing works exactly as you describe. Metal indeed requires you to tell it which memory objects are in use during a pass, but not the exact pointer range. There is nothing preventing you from allocating one huge Metal buffer and doing all your sub-allocations from there. In fact, that’s the preferred way since every buffer incurs additional overhead (and Metal gives you the heaps API to simplify these things). In summary, the system must already have a way to track which parts of the cache need flushing and so my guess is that the “use” APIs are there primarily to ensure that the resources are in memory and bound to the appropriate page tables. Vulkan doesn’t need the “use” API because all resources are explicitly assumed to be resident anyway, unlike Metal.

From the API consumer perspective the issue is that the CPU and the GPU use different virtual address spaces. And anyway, the buffer handle is just the GPU virtual address. It’s even called GPU address in the API and you can write valid GPU pointers from the CPU using these APIs. Real handles are used for textures and samplers and they indeed encode offsets into the per-process descriptor tables (same system Nvidia uses).

I don't know the API well, but I assume that you are referring to what is called "Private" memory?
As far as I can tell, the issue is not that that lives in a separate address space; it is that while that memory lives in the same physical address space, there is (by definition) no reason that the CPU should ever want to view it.

The consequences are that while I am using such address space, IF NECESSARY, it can overflow L1 to L2, to SLC, all the way to DRAM. But under normal circumstance ideally it will live purely in L1 and L2 and not even flow out to SLC. And then, when I am done using this memory, I can simply invalidate the cache lines; I have no obligation to flush them to SLC and DRAM because the reason to do that is to make them visible to the CPU.

......

Another issue where Apple and nVidia are different is that nVidia breaks up the 64 bit address space into regions that are mapped to different hardware types. Thus, simply by looking at the high bits of an address, you know where to route it (eg either device memory or which Scratchpad of which core).
Metal has typed pointers so that a pointer that accesses Scratchpad is a different type of thing from a pointer that accesses device memory (and uses different instructions, eg tile_load, device_load, stack_load, threadgroup_load). nVidia's scheme definitely seems more elegant and powerful, and maybe one day Apple Silicon will adopt it. But for now this seems an additional issue – you can't, for example, simply sling around these pointers and store them in generic data structures (eg trees or whatever), then extract them and know how to use them.

I know even less about Blender than about graphics APIs, but what may be meant by "Handle" here is simply Apple (or at least someone concerned about getting this working on Apple Silicon) suggesting that instead of storing raw pointers in various complex data structures, we wrap each pointer in a very minimal shell (call it a "handle", because that's the idea, regardless of what Metal means by Handle), store that handle in the various complex data structures, and use a few well-defined dereference macros (or properties if you want to go Object Oriented) to map from the handle to the pointer.
This would all be free (shell and dereference are no-op) for nvidia, while doing whatever Apple requires (eg maybe the dereference macro has enough info from the handle extra bits to generate the appropriate Scratchpad vs Device load instruction).

leman · Dec 16, 2023

name99 said:
I don't know the API well, but I assume that you are referring to what is called "Private" memory?

No, just memory. Private memory is mostly relevant for textures at it allows the system to optimize data layout (can't really do it if the application might be accessing the data from the CPU side). As far as I can tell there is no advantages in using private allocations for data on Apple Silicon.

name99 said:
Another issue where Apple and nVidia are different is that nVidia breaks up the 64 bit address space into regions that are mapped to different hardware types. Thus, simply by looking at the high bits of an address, you know where to route it (eg either device memory or which Scratchpad of which core).
Metal has typed pointers so that a pointer that accesses Scratchpad is a different type of thing from a pointer that accesses device memory (and uses different instructions, eg tile_load, device_load, stack_load, threadgroup_load). nVidia's scheme definitely seems more elegant and powerful, and maybe one day Apple Silicon will adopt it. But for now this seems an additional issue – you can't, for example, simply sling around these pointers and store them in generic data structures (eg trees or whatever), then extract them and know how to use them.

Ah, interesting. So that's how Nvidia does it. Thanks, I didn't know it!

Metal's pointer address spaces can be a bit annoying to deal with, but there is also a certain advantage to this kind of explicit design. And it does come with benefits. For one, it makes the code more readable and arguably more maintainable — it really helps to see if a function receives a pointer to shared memory or to device memory. And since the address space is part of the type system you can use it in overloads, allowing you to specialise behaviour depending on memory type, which can be quite useful at times.

name99 said:
I know even less about Blender than about graphics APIs, but what may be meant by "Handle" here is simply Apple (or at least someone concerned about getting this working on Apple Silicon) suggesting that instead of storing raw pointers in various complex data structures, we wrap each pointer in a very minimal shell (call it a "handle", because that's the idea, regardless of what Metal means by Handle), store that handle in the various complex data structures, and use a few well-defined dereference macros (or properties if you want to go Object Oriented) to map from the handle to the pointer.

Also a possible explanation. I still think it's more likely that the person who wrote the notes was a bit confused and referred to Apple GPU virtual space pointer as "handles". To me this interpretation makes the most sense given the context of discussion. But of course, it's just one guess among many.

Standard · Dec 29, 2023

Hi everyone. I have a question regarding my new MacBook Pro 16" M3 Max and rendering. Please @ me if you are able to respond, so I am notified. The machine is fully loaded, and I'm rendering in Maya with Arnold. The MacBook is in clamshell mode, and it's plugged into two LG 32" Duo monitors with USB C. I went into the battery settings and set high power mode for both working on the battery and plugged in.

At times when I am rendering, the fans crank up, but then other times, it's rendering and the fans are not on and the render appears to be going slow. I am not sure if I am utilizing full power of the machine? For example right now it is rendering but my CPU usage is up and down from 8%-40%. If anyone can please advise on how I can keep this consistent, that would be most appreciated. I have some heavy renders to complete and they will take long enough. Thank you.

aytan · Dec 29, 2023

Standard said:
Hi everyone. I have a question regarding my new MacBook Pro 16" M3 Max and rendering. Please @ me if you are able to respond, so I am notified. The machine is fully loaded, and I'm rendering in Maya with Arnold. The MacBook is in clamshell mode, and it's plugged into two LG 32" Duo monitors with USB C. I went into the battery settings and set high power mode for both working on the battery and plugged in.

At times when I am rendering, the fans crank up, but then other times, it's rendering and the fans are not on and the render appears to be going slow. I am not sure if I am utilizing full power of the machine? For example right now it is rendering but my CPU usage is up and down from 8%-40%. If anyone can please advise on how I can keep this consistent, that would be most appreciated. I have some heavy renders to complete and they will take long enough. Thank you.

Do you monitoring CPU temps during rendering?. If your single frame render time is exceed CPUs max temp point, every other frame will go slower and slower. It will goes on like this until CPU temps turn back defined temps by Apple. I do not use Arnold now and I can not imagine its recent version. Somehow even fastest CPU's can not feed software demands and one of them had to wait each other for a new task. This could end up different single frame render times or CPU usage for same render task.
Also Maya and Arnold could go crazy somehow depended on your scene's aspects.
Hope you could get a solution soon.

jujoje · Dec 29, 2023

Standard said:
Hi everyone. I have a question regarding my new MacBook Pro 16" M3 Max and rendering. Please @ me if you are able to respond, so I am notified. The machine is fully loaded, and I'm rendering in Maya with Arnold. The MacBook is in clamshell mode, and it's plugged into two LG 32" Duo monitors with USB C. I went into the battery settings and set high power mode for both working on the battery and plugged in.

At times when I am rendering, the fans crank up, but then other times, it's rendering and the fans are not on and the render appears to be going slow. I am not sure if I am utilizing full power of the machine? For example right now it is rendering but my CPU usage is up and down from 8%-40%. If anyone can please advise on how I can keep this consistent, that would be most appreciated. I have some heavy renders to complete and they will take long enough. Thank you.

This is a bit anecdotal, so not sure how much help it will be, but had a similar problem with Houdini/Karma - sometime it would use all the cores, but sometimes only 30%. It looked like the render job was getting the wrong QoS assigned, so was running primarily on the efficiency cores with performance cores only getting a small percentage of the work.

Might be worth checking the core usage in the Activity Monitor to see if this is the case - for me it would start off on most cores but then shift things over to the E cores after a few minutes, so probably worth letting it render for a bit.

If this is the case, then, in terms of fixes, unfortunately it's developer thing; I emailed Sidefx and they fixed it a few builds later. I don't think you can set the QoS manually, unlike affinity on windows...

altaic · Dec 29, 2023

jujoje said:
This is a bit anecdotal, so not sure how much help it will be, but had a similar problem with Houdini/Karma - sometime it would use all the cores, but sometimes only 30%. It looked like the render job was getting the wrong QoS assigned, so was running primarily on the efficiency cores with performance cores only getting a small percentage of the work.

Might be worth checking the core usage in the Activity Monitor to see if this is the case - for me it would start off on most cores but then shift things over to the E cores after a few minutes, so probably worth letting it render for a bit.

If this is the case, then, in terms of fixes, unfortunately it's developer thing; I emailed Sidefx and they fixed it a few builds later. I don't think you can set the QoS manually, unlike affinity on windows...

Does renice (the terminal command) on the process not work?

Standard · Dec 29, 2023

aytan said:
Do you monitoring CPU temps during rendering?. If your single frame render time is exceed CPUs max temp point, every other frame will go slower and slower. It will goes on like this until CPU temps turn back defined temps by Apple. I do not use Arnold now and I can not imagine its recent version. Somehow even fastest CPU's can not feed software demands and one of them had to wait each other for a new task. This could end up different single frame render times or CPU usage for same render task.
Also Maya and Arnold could go crazy somehow depended on your scene's aspects.
Hope you could get a solution soon.

Hey aytan, thank you. I do not have anything to monitor the temperature, what would you recommend? I wonder, would the Lock Screen somehow lower performance?

jujoje said:
This is a bit anecdotal, so not sure how much help it will be, but had a similar problem with Houdini/Karma - sometime it would use all the cores, but sometimes only 30%. It looked like the render job was getting the wrong QoS assigned, so was running primarily on the efficiency cores with performance cores only getting a small percentage of the work.

Might be worth checking the core usage in the Activity Monitor to see if this is the case - for me it would start off on most cores but then shift things over to the E cores after a few minutes, so probably worth letting it render for a bit.

If this is the case, then, in terms of fixes, unfortunately it's developer thing; I emailed Sidefx and they fixed it a few builds later. I don't think you can set the QoS manually, unlike affinity on windows...

Thanks jujoje for the insight, most appreciated. I have just figured out how to view the cores with Activity monitor. I will monitor this render. I wonder also if the laptop idling and putting the Lock Screen on has any effect on performance? I'll try disabling it.

aytan · Dec 29, 2023

Standard said:
Hey aytan, thank you. I do not have anything to monitor the temperature, what would you recommend? I wonder, would the Lock Screen somehow lower performance?

Thanks jujoje for the insight, most appreciated. I have just figured out how to view the cores with Activity monitor. I will monitor this render. I wonder also if the laptop idling and putting the Lock Screen on has any effect on performance? I'll try disabling it.

I don't think so. We have been use an M1 Max 16 inch the way like yours nearly for one year. We did not have any issues about lock screen.

Appletoni · Dec 29, 2023

jujoje said:
This is a bit anecdotal, so not sure how much help it will be, but had a similar problem with Houdini/Karma - sometime it would use all the cores, but sometimes only 30%. It looked like the render job was getting the wrong QoS assigned, so was running primarily on the efficiency cores with performance cores only getting a small percentage of the work.

Might be worth checking the core usage in the Activity Monitor to see if this is the case - for me it would start off on most cores but then shift things over to the E cores after a few minutes, so probably worth letting it render for a bit.

If this is the case, then, in terms of fixes, unfortunately it's developer thing; I emailed Sidefx and they fixed it a few builds later. I don't think you can set the QoS manually, unlike affinity on windows...

Houdini always uses all cores.
That is what it should do.

Houdini Chess Engine

www.cruxis.com

Rank 33 https://computerchess.org.uk/ccrl/4040/
Yes a MacBook 16-inch M3 MAX is slow. It should have at least 18-inch and 32 CPU cores.

mr_roboto · Dec 30, 2023

Appletoni said:
Houdini always uses all cores.
That is what it should do.

Houdini Chess Engine

Houdini Chess Engine

www.cruxis.com

Rank 33 https://computerchess.org.uk/ccrl/4040/
Yes a MacBook 16-inch M3 MAX is slow. It should have at least 18-inch and 32 CPU cores.

Are you, like, a bot that just autoreplies to any post with the name of a chess program in it? Pay attention, the Houdini being talked about here isn't a chess program, it's a 3D renderer, because this is a thread about 3D renderers, not a thread about chess.

Also, the Houdini chess program is apparently a GPL-violating copy (with modification) of Stockfish. It's not as strong as real Stockfish and you should feel ashamed for having promoted it, even if computerchess.org.uk allowed it to stay on their list despite its questionable background.

No, a 16" M3 Max is not slow. Nor should it have at least 18-inch and 32 CPU cores, it's intended to be a practical notebook. Yes we know you have a chess obsession and think any computer which doesn't top the chess performance charts is a failure. Nobody cares.

Slartibart · Dec 30, 2023

Appletoni said:
It should have at least 18-inch and 32 CPU cores.

I am flabbergasted - like probably the majority here - that you can restrain yourself to such a display size… I mean for a chess program interface… even if you could alleviate the understandable pain technically by using external displays - yes, yes, we know, we know, but what if one wants to play a serious game of chess while canoeing through the norwegian Lysfjord?!?!!! Hiking the Atacama? Snorkelling in Hawaii? - even worse:

the 16” M3 Max MacBook offers only support for a FINITE number of connected external displays.

only a finite number of external displays supported!

F.I.N.I.T.E.

You completely missed this, didn’t you?

But fear not, we here at MR are dedicated to help if this happens. You are welcome.

Just because you missed that in your well referenced and fact-based comment, please bear with me when I point you to another aspect in this context:

Suspiciously Apple offers only support for a finite number of displays ON EVERY of their display supporting devices.

Coincidence?

You do think not, do you?

😂🤣🙃

sirio76 · Dec 30, 2023

mr_roboto said:
Are you, like, a bot that just autoreplies to any post with the name of a chess program in it? Pay attention, the Houdini being talked about here isn't a chess program, it's a 3D renderer, because this is a thread about 3D renderers, not a thread about chess.

Also, the Houdini chess program is apparently a GPL-violating copy (with modification) of Stockfish. It's not as strong as real Stockfish and you should feel ashamed for having promoted it, even if computerchess.org.uk allowed it to stay on their list despite its questionable background.

No, a 16" M3 Max is not slow. Nor should it have at least 18-inch and 32 CPU cores, it's intended to be a practical notebook. Yes we know you have a chess obsession and think any computer which doesn't top the chess performance charts is a failure. Nobody cares.

It’s beyond me how this kind of users are tolerated here..

Appletoni · Dec 30, 2023

mr_roboto said:
Are you, like, a bot that just autoreplies to any post with the name of a chess program in it? Pay attention, the Houdini being talked about here isn't a chess program, it's a 3D renderer, because this is a thread about 3D renderers, not a thread about chess.

Also, the Houdini chess program is apparently a GPL-violating copy (with modification) of Stockfish. It's not as strong as real Stockfish and you should feel ashamed for having promoted it, even if computerchess.org.uk allowed it to stay on their list despite its questionable background.

No, a 16" M3 Max is not slow. Nor should it have at least 18-inch and 32 CPU cores, it's intended to be a practical notebook. Yes we know you have a chess obsession and think any computer which doesn't top the chess performance charts is a failure. Nobody cares.

Take ChessBase 17 3D renderer, which renders 3D chess boards (+Raytracing), while the chess engines play.
Problem solved😁

sirio76 · Dec 30, 2023

Standard said:
Hi everyone. I have a question regarding my new MacBook Pro 16" M3 Max and rendering. Please @ me if you are able to respond, so I am notified. The machine is fully loaded, and I'm rendering in Maya with Arnold. The MacBook is in clamshell mode, and it's plugged into two LG 32" Duo monitors with USB C. I went into the battery settings and set high power mode for both working on the battery and plugged in.

At times when I am rendering, the fans crank up, but then other times, it's rendering and the fans are not on and the render appears to be going slow. I am not sure if I am utilizing full power of the machine? For example right now it is rendering but my CPU usage is up and down from 8%-40%. If anyone can please advise on how I can keep this consistent, that would be most appreciated. I have some heavy renders to complete and they will take long enough. Thank you.

Check two things, thermal management and RAM saturation, if they are both fine it can be a software issue. If for whatever reason the cores are too hot, you can download the free Mac fans control app, check the temperatures and rise manually the fan speed, see if this improve the performance. Also if the RAM is way too saturated it can slow down things a bit, even though not that much usually.
If the problem persists try to debug your scene, even a single shader may cause the slow down if it’s badly threaded/optimized, if you isolate the issues send the scenes to developers to fix it. Of course it can also be some other problems related to Arnold, Maya or even other plug-in.

richinaus · Dec 30, 2023

sirio76 said:
Check two things, thermal management and RAM saturation, if they are both fine it can be a software issue. If for whatever reason the cores are too hot, you can download the free Mac fans control app, check the temperatures and rise manually the fan speed, see if this improve the performance. Also if the RAM is way too saturated it can slow down things a bit, even though not that much usually.
If the problem persists try to debug your scene, even a single shader may cause the slow down if it’s badly threaded/optimized, if you isolate the issues send the scenes to developers to fix it. Of course it can also be some other problems related to Arnold, Maya or even other plug-in.

Given we are talking about an Autodesk app I would bet that badly threaded / not optimised is the most likely answer.

3D Rendering on Apple Silicon, CPU&GPU

macrumors 68020

Attachments

macrumors Core

Contributor

macrumors 6502a

Contributor

macrumors 6502a

macrumors 65816

macrumors 68000

macrumors 68000

macrumors Core

macrumors 68020

macrumors Core

macrumors 6502

macrumors regular

macrumors regular

macrumors 6502a

macrumors 6502

macrumors regular

Suspended

macrumors 6502a

macrumors 68030

macrumors 6502a

Suspended

macrumors 6502a

macrumors 68020

Our Staff