eGPU (external graphics card) for Apple Silicon

Xenobius · May 12, 2025

The first eGPU experiment for Apple Silicon:

GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️

You like pytorch? You like micrograd? You love tinygrad! ❤️ - GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️

github.com

buggz · May 13, 2025

USB would be super slow, no?

dmccloud · May 13, 2025

Using an eGPU over USB would be like connecting a garden hose to fire hydrant.

wmy5 · May 14, 2025

buggz said:
USB would be super slow, no?

Yes but it is not a concern for some applications (like AI inference) because they don't require a lot of host-GPU communication bandwidth.

dmccloud · May 15, 2025

wmy5 said:
Yes but it is not a concern for some applications (like AI inference) because they don't require a lot of host-GPU communication bandwidth.

This is incorrect. Because the data being used to run the inference models has to be loaded into the eGPU RAM first, using just a USB connection results in significantly slower overall processing speeds. This also affects sending the processed data back to the CPU/SoC. It doesn't matter how much memory bandwidth the GPU has if data can only be trickled between it and the CPU. Even a smaller model such as Llama 3.1 8B takes up almost 5GB of disk, and that's a lot to transfer over just a USB connection. Larger models such as Llama 32B load over 17GB into vRAM, which would result in significantly slower overall performance even with a card like the RTX 5090. The bandwidth is needed to load the models onto the GPU as quickly and efficiently as possible.

wmy5 · May 15, 2025

dmccloud said:
This is incorrect. Because the data being used to run the inference models has to be loaded into the eGPU RAM first, using just a USB connection results in significantly slower overall processing speeds. This also affects sending the processed data back to the CPU/SoC. It doesn't matter how much memory bandwidth the GPU has if data can only be trickled between it and the CPU. Even a smaller model such as Llama 3.1 8B takes up almost 5GB of disk, and that's a lot to transfer over just a USB connection. Larger models such as Llama 32B load over 17GB into vRAM, which would result in significantly slower overall performance even with a card like the RTX 5090. The bandwidth is needed to load the models onto the GPU as quickly and efficiently as possible.

Yes but loading the model is a one-time cost, which can be amortized if one runs the job for a long time.

Not saying using USB is ideal, but it can be useful for a limited set of use cases.

Shelly74 · May 24, 2025

I there a eGPU Card for Thunderbolt 5 which can use for Gaming on Mac Studio M4 Max?

leman · May 24, 2025

Shelly74 said:
I there a eGPU Card for Thunderbolt 5 which can use for Gaming on Mac Studio M4 Max?

No

dmccloud · May 27, 2025

wmy5 said:
Yes but loading the model is a one-time cost, which can be amortized if one runs the job for a long time.

Not saying using USB is ideal, but it can be useful for a limited set of use cases.

Even if the model was running 24/7, transferring the data back and forth between the CPU and GPU over USB would incur a significant penalty with respect to the rate at which data is received.

maflynn · Jun 2, 2025

eGPUs are going the ways of SLI, an interesting option but not feasible in the long run. For gamers, there's definitely cheaper alternatives be they gaming PCs, consoles or handhelds like Steam Deck. For non-gamers who either want the processing power, again there are better tools for the job.

Search

Search

eGPU (external graphics card) for Apple Silicon

Xenobius

macrumors regular

GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️

buggz

macrumors regular

dmccloud

macrumors 68040

wmy5

macrumors 6502

dmccloud

macrumors 68040

wmy5

macrumors 6502

Shelly74

macrumors member

leman

macrumors Core

dmccloud

macrumors 68040

maflynn

macrumors Broadwell

Our Staff