Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
67,505
37,794


YouTuber Dave Lee of Dave2D fame has demonstrated how Apple's new Mac Studio equipped with an M3 Ultra chip can efficiently run a huge version of the DeepSeek R1 AI model locally, provided that users spec the machine with the maximum 512GB of memory.

Mac-Studio-2025.jpg

According to Lee's testing, the 671 billion parameter AI model can be executed directly on Apple's high-end workstation, but it requires substantial memory resources, consuming 404GB of storage and requiring the manual allocation of 448GB of video RAM through Terminal commands.

The M3 Ultra's unified memory architecture is key to this performance, allowing the system to handle a 4-bit quantized version of DeepSeek R1 efficiently. The quantization slightly reduces accuracy, but it maintains all parameters and delivers approximately 17-18 tokens per second, which is sufficient for many practical applications.

Perhaps most impressively, the Mac Studio accomplishes this while consuming under 200 watts of power. Comparable performance on traditional PC hardware would require multiple GPUs drawing approximately ten times more electricity.

The capability to run such advanced AI models locally offers privacy advantages for sensitive applications like healthcare data analysis, where sending information to cloud services raises security concerns.


However, this performance doesn't come cheap – a Mac Studio configured with M3 Ultra and 512GB of RAM starts at around $10,000. Fully maxed out, an M3 Ultra Mac Studio with 16TB of SSD storage and an Apple M3 Ultra chip with 32-core CPU, 80-core GPU, and 32-core Neural Engine costs a cool $14,099. Of course, for organizations requiring local AI processing of sensitive data, the Mac Studio offers a relatively power-efficient solution compared to alternative hardware configurations.

Apple says the M3 Ultra is the fastest Mac chip it has ever released, thanks to its strategy of fusing two M3 Max chips together using the company's "UltraFusion" technology. This makes the chip's specs double that of the M3 Max.

Article Link: Mac Studio With M3 Ultra Runs Massive DeepSeek R1 AI Model Locally
 
Last edited:
Would it be safe to say that in 5-10 years a smartphone will be able to run a model like this internally and without the internet?

They can run small models today but it doesn’t seem likely that they’d scale to 512GB of RAM in 10 years.


So 5-10 years from now they will be able to run larger models than the small LLMs we can run today, but no not this sized model.
 
Would it be safe to say that in 5-10 years a smartphone will be able to run a model like this internally and without the internet?
Probably sooner if you mean equivalent performance, not model size. You can already run Google Gemma 3 27B or Qwen QwQ 34B on a spect out MBP, and they are close to Deepseek in performance.
Just last year Meta’s Llama 3.3 70B matched the performance of Llama 3.1 405B released 6 months prior. X6 efficiency improvement! Llama 3.1 405 was itself close to original GPT4 which had many trillions of parameters. From needing 8 Nvidia h100 to one MBP in two years.
 
"Perhaps most impressively, the Mac Studio accomplishes this while consuming under 200 watts of power. Comparable performance on traditional PC hardware would require multiple GPUs drawing approximately ten times more electricity."

Assuming 1 hour of LLM use per day at $0.18 per kWh, that's about $11 per month to run a "comparable" PC versus about $1 for the Mac Studio. The Mac Studio is cheap to run.

That was just an estimation in the video about how much more electricity it would use.

The reality is worse for the non-Mac solution, not even counting the fact that you'd need to buy many GPUs to balance out the available RAM on the Mac Studio. You'd need 16 RTX 5090s to get 512 GB of RAM. Each of those idles at maybe 50 W. Let's say 400 W under load. Add in the draw of the CPU and more.

Using that estimate, 1 hour of LLM use per day would be about $40 per month in electricity costs versus $1 for the Mac Studio (using an $0.18 per kWh cost). Add to that the cost of the GPUs (let's be generous and say only $32,000 for 16 5090s), etc. and you're looking at something that costs about 4x the Mac Studio and is about 40x more electricity to run for similar performance.

Scale that up. Let's say this is a business running an LLM for 10 hours per day. That's $10 per month for the Mac and $400 per month for the non-Mac solution. It doesn't take long for the Mac to pay for itself just in electricity savings alone. Although, if you live somewhere cold and want to use your 16 5090s as your heating, maybe you can offset some heating costs that way. Just be careful you don't end up with too much heat!

The 512 GB Studio is not for everyone, but it's a terrific value for some people and uses.

Although, I’ll add that with an nvidia card in a Linux box, you can offload into RAM too with various software managing LLMs. So you will not strictly need 16 5090s to reach the total RAM capacity of the Mac Studio solution. Performance of LLMs offloaded into RAM will not be great, however. This means there’s not much quite like the new Studio for local LLMs.
 
Last edited:
As the article says (but should be emphasized more), it’s not the real thing, but a 4bit quantization. You would need over 700GB for running the real thing. I ordered the 128GB M4 Max Studio instead of the M3 Ultra because the price is not worth it, IMHO, if you cannot run the real complete models without extra quantization. The day you can run the complete current models, then the price might be worth it, but not before that happens.
 
YouTuber Dave Lee of Dave2D fame has demonstrated how Apple's new Mac Studio equipped with an M3 Ultra chip can efficiently run a huge version of the DeepSeek R1 AI model locally, provided that users spec the machine with the maximum 512GB of memory.

According to Lee's testing, the 671 billion parameter AI model can be executed directly on Apple's high-end workstation, but it requires substantial memory resources, consuming 404GB of storage and requiring the manual allocation of 448GB of virtual RAM through Terminal commands.
Would it be safe to say that in 5-10 years a smartphone will be able to run a model like this internally and without the internet?
No smartphone will have 512GB of RAM in 10 years. At least not from Apple, if we can go by history. Unless 32GB of iPhone RAM is analogous to 512GB of Mac Studio RAM
 
If LLMs are a significant part of the future of computing and privacy is going to be a huge part of that, then Apple has a huge advantage from a hardware perspective. This is the real unspoken hero of what Apple is doing.

While everyone is focused on a delayed "AI" end user roll out, and some absolutely losing their stuff over it, Apple has created the hardware that is blowing away all the competition. Once the software side catches up, Apple will be lightyears ahead.

Keep in mind, big players like META and Google are absolutely pirating any data they can get their hands on. Unfortunately, they are too big to fail like thepiratebay is.

Try to buy a Nvidia 5090 with a measly 32GB of Vram that needs a disgusting amount of power to run. The fake MSRP is $2,000 and are being sold on eBay for $7,000.

With all the crying going on I think Apple is doing this exactly right.
 
Last edited:
If LLMs are a significant part of the future of computing and privacy is going to be a huge part of that, then Apple has a huge advantage from a hardware perspective. This is the real unspoken hero of what Apple is doing.

While everyone is focused on a delayed "AI" end user roll out, and some absolutely losing their stuff over it, Apple is created the hardware that is blowing away all the competition. Once the software side catches up, Apple will be lightyears ahead.

Keep in mind big players like META and Google are absolutely pirating any data they can get their hands on. Unfortunately, they are too big to fail like thepiratebay is.

Try to buy a Nvidia 5090 with a measly 32GB of Vram that needs a disgusting amount of power to run. The fake MSRP is $2,000 and are being sold on eBay for $7,000.

Will all the crying going on I think Apple is doing this exactly right.

But isn't it easier to just complain rather than trying to understand stuff?
 
No smartphone will have 512GB of RAM in 10 years. At least not from Apple, if we can go by history. Unless 32GB of iPhone RAM is analogous to 512GB of Mac Studio RAM
While that is most likely true, it is also likely true that today's LLM models will be long forgotten as inefficient 10 years from now. I expect that the model efficiency and memory requirements will meet somewhere in the middle.
 
According to Lee's testing, the 671 billion parameter AI model can be executed directly on Apple's high-end workstation, but it requires substantial memory resources, consuming 404GB of storage and requiring the manual allocation of 448GB of virtual RAM through Terminal commands.

448GB of VRAM, not virtual memory (V = Video)!

If virtual memory were a usable solution to anything in this particular computing space there would be nothing terribly special about this machine.
 
Last edited:
It may run a cut down version of the model, but it won't be as fast as dedicated hardware running the full model locally.
 
If LLMs are a significant part of the future of computing and privacy is going to be a huge part of that, then Apple has a huge advantage from a hardware perspective. This is the real unspoken hero of what Apple is doing.

While everyone is focused on a delayed "AI" end user roll out, and some absolutely losing their stuff over it, Apple is created the hardware that is blowing away all the competition. Once the software side catches up, Apple will be lightyears ahead.

Keep in mind big players like META and Google are absolutely pirating any data they can get their hands on. Unfortunately, they are too big to fail like thepiratebay is.

Try to buy a Nvidia 5090 with a measly 32GB of Vram that needs a disgusting amount of power to run. The fake MSRP is $2,000 and are being sold on eBay for $7,000.

Will all the crying going on I think Apple is doing this exactly right.

Spot-on analysis. I agree 100%. And that Apple is on the right track prioritizing user privacy and efficiency over trying to get something to market quickly (with potential adverse consequences).

Apple's approach will be paying great dividends.
 
Sooooo, just how exactly are we going to stop, say “North Country Whatever”, from using local LLMˋs from giving them really good recipes for like “chewy chocolate chip cookie” that’s going to be really bad for the rest of the world or hacking my local bank or whatever? Asking for many friends.
 
As the article says (but should be emphasized more), it’s not the real thing, but a 4bit quantization. You would need over 700GB for running the real thing. I ordered the 128GB M4 Max Studio instead of the M3 Ultra because the price is not worth it, IMHO, if you cannot run the real complete models without extra quantization. The day you can run the complete current models, then the price might be worth it, but not before that happens.

Good spot. I doubt Dave2D is an expert in AI LLM, so surprised he made a video of it.
 
Privacy and security where you don’t want to upload confidential documents or information to a large model online.
But are you ok with using other people’s private and copyrighted material to train the model?
 
What are people doing that need to run a massive LLM locally that makes it worth the cost?

hospitals or medical offices or labs that want to crunch tons of patient data and get insights from it, within the scope of hippa privacy regulations

basically for only 8k you can have a magical box that takes in someones xray and tells you every single issue they have, 10% more accurate than a human doctor as per latest studies
 
Privacy and security where you don’t want to upload confidential documents or information to a large model online.
That’s exactly it. I use ChatGPT everyday now but I limit what I can do with it because I can’t upload certain intellectual property to it.

I’m not in a position to get a maxed out studio for this just yet, but if I had a client project and could bill it through to them I would get this in a heartbeat. $14k is nothing for big business - they pay that for transatlantic flights all day long.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.