Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
hospitals or medical offices or labs that want to crunch tons of patient data and get insights from it, within the scope of hippa privacy regulations

basically for only 8k you can have a magical box that takes in someones xray and tells you every single issue they have, 10% more accurate than a human doctor as per latest studies
Please provide a source for where Deepseek running locally can achieve this. Not being a jerk, but all AI models aren’t created equally and there is a lot of proprietary stuff going on that is not published research at OpenAI, Microsoft, Anthropic, etc.

I don’t think there is one because Deepseek has not demonstrated this capability especially with respect to multimodality. If there is I’d love to read it and welcome you to share. Perhaps someone fine-tuned Deepseek within 2 months to do this, and I’d really like to read that research if it exists but I imagine this was a broad statement and not specific which is important to understand.
 
Can somebody explain the actual use of an LLM that cannot search for up-to-date info online?
I'm not being sarcastic, I actually want to know.
My personal use case right now is brainstorming a novel I’m writing. Online AIs tend to be too highly censored to be useful.

I don’t need to use massively large models for this (between 10B - 27B is fine), but I do want a decent sized context window and that takes additional RAM.
 
Please provide a source for where Deepseek running locally can achieve this. Not being a jerk, but all AI models aren’t created equally and there is a lot of proprietary stuff going on that is not published research at OpenAI, Microsoft, Anthropic, etc.

I don’t think there is one because Deepseek has not demonstrated this capability especially with respect to multimodality. If there is I’d love to read it and welcome you to share. Perhaps someone fine-tuned Deepseek within 2 months to do this, and I’d really like to read that research if it exists but I imagine this was a broad statement and not specific which is important to understand.

It was a broad statement expressing what i believe you can do with current hardware/software applied to a healthcare business
 
  • Like
Reactions: novagamer
I wouldn’t be surprised if Apple sold accelerator cards for the next gen Mac Pro featuring M3 Ultra chips with 512 GB each… package them as server racks and you have something that could hurt nvidia badly.
 
Using Ollama I've got the R1 70b running on my Mac Mini M4 Pro (14/20) which has 64G of ram.
It's usable in that the output is about the same speed at which I read at.
Are these useful to run locally?
Not really, not yet anyway. I have a subscription to OpenAI which I use all the time.
What is interesting is that this is the beginning of a new direction in desktop computing.
Apple appears to have been caught napping with AI and still hasn't got its act together with it.
The interesting thing, for me anyway, is how this need or ability to run LLMs on personal computers locally will influence the evolution of Apple Silicon. We probably won't find out for another 2 years though.
Nvdia's making good money on this revolution, it'll also be interesting to see what happens to the RAM market with local AI on personal machines requiring larger amounts of RAM that traditional applications.
 
Last edited:
  • Like
Reactions: JSRinUK
You expect Apple to increase RAM 64-fold in 11 years? I don’t buy it.
How much has top spec VRAM in a Mac increased in the past 11 years? 12GB in 2014 to 512GB today. That is over a 42x increase and now VRAM is significantly more important.
 
448GB of VRAM, not virtual memory (V = Video)!


Correct... and also not.

"VRAM" has become an enchanted term (thanks to the likes of Nvidia).

All of the RAM for Apple Silicon Macs are the same: LPDDR, one or more chips, directly connected to the SoC.

And that is why an Apple Silicon device, as currently implemented, will not perform as quickly as the Nvidia systems with their custom HBM implementation. OTOH, Macs are less expensive than Nvidia systems.
 
According to Lee's testing, the 671 billion parameter AI model can be executed directly on Apple's high-end workstation, but it requires substantial memory resources, consuming 404GB of storage and requiring the manual allocation of 448GB of video RAM through Terminal commands.

Pass. My mother is 85, she's never going to get those terminal commands right.
 
  • Haha
Reactions: Howard2k
Good to see the impressive power of Apple Silicon Macs. Expecting the next Mac Pro to also be even more capable.
 
  • Like
Reactions: mganu
I wouldn’t be surprised if Apple sold accelerator cards for the next gen Mac Pro featuring M3 Ultra chips with 512 GB each… package them as server racks and you have something that could hurt nvidia badly.

They can do it now, along with a $99 license for a new version of macOS server
 
How much has top spec VRAM in a Mac increased in the past 11 years? 12GB in 2014 to 512GB today. That is over a 42x increase and now VRAM is significantly more important.

That's true, but a bit misleading. We're not gonna see another leap like that.

And it isn't technically 512; while the GPU cores can share RAM, the CPU cores (and NPU cores) will be using plenty for themselves that the GPU cores have no use for.
 
My personal use case right now is brainstorming a novel I’m writing. Online AIs tend to be too highly censored to be useful.

I don’t need to use massively large models for this (between 10B - 27B is fine), but I do want a decent sized context window and that takes additional RAM.
Thanks for the info.
Next year I need to change my desktop, and I'd like to carefully consider the amount of ram I need.
I'm on 128GB now (M1 Ultra, but I'd consider getting 256).
 
Thanks for the info.
Next year I need to change my desktop, and I'd like to carefully consider the amount of ram I need.
I'm on 128GB now (M1 Ultra, but I'd consider getting 256).
My experience is that there’s diminishing returns with Mac when it comes to LLM.

More RAM enables you to load larger LLMs, but it doesn’t necessarily run them quickly. For me, I think the “sweet spot” is 128GB. If I went higher and put in larger LLMs, I’m not sure the time to first token and t/s would make it particularly useful. Even though I can load up 70B models in my Studio, I’m more likely to use something around 30B because it’s more responsive.

And if I run a 30B model and also do something else intensive alongside it (like a Stable Diffusion AI image generation), both slow down - and nothing is using swap space, so there’s no disk-swapping slowing it down.

I’m interested to know what others get with large models and the Ultra (the Ultra has even higher memory bandwidth), but I’d be surprise if large models (400B+) in a 512GB RAM Studio is practical on a daily basis, especially if you want to run anything else alongside it at the same time.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.