Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Operating systems are full of thousands of features you never use but can't disable because they are baked in.
I could disable Siri, I could disable Apple Intelligence, I could disable a lot of annoying things about macOS and iOS. I'm just frustrated that Apple puts Apple Intelligence into the operating system. Just build a MCP like API for iOS and release Apple Intelligence / Siri AI as an or two app(s). If people like it, they will install it, if not, not.
 
  • Like
Reactions: Eddie Beeps
The local model is not taking up 18GB. Literally far too big to load into memory or run at any reasonable speed. Your problem is not Siri AI, it is something else.
IMG_2672.jpeg

Ok you’re right, it wasn’t 18…it was 13.
Yay Apple took 5 less GB from me!!! Lol. It does take 17GB on my iPhone though. So egregious.
 
I think it’s the combination of the iOS, Apple Intelligence and system data that in first beta is eating up space. Think near 50+ gb which just makes iPads that have 128 gb basically unusable. Maybe it’s something corrected in the next beta or it’s related to the indexing in progress.
 
  • Like
Reactions: AssassuN
View attachment 2638851
Ok you’re right, it wasn’t 18…it was 13.
Yay Apple took 5 less GB from me!!! Lol. It does take 17GB on my iPhone though. So egregious.
On my iPad Pro (M4) it's 24 GB and on my Mac it's 27 GB. That's over 40 GB of OS and models. Ludicrous.

The local model is not taking up 18GB. Literally far too big to load into memory or run at any reasonable speed. Your problem is not Siri AI, it is something else.
Apple uses a technique to avoid loading the entire model into memory, keeping as much in storage as possible. Relevant paragraph:
Instead of forcing the entire model into DRAM, the full model is stored in flash memory (NAND). Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt. A lightweight, dense block selects a fixed set of experts during initial processing, periodically reselecting them during generation. To minimize data movement, the model relies on a high percentage of always-active “shared experts” alongside input-dependent “routed experts” swapped into DRAM only when needed.

most people don't even know what they are talking about
The call is coming from inside the house.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.