There’s a concept in project management known as the “Cone of Uncertainty” - at the time that you set out to do something, especially something that hasn’t been done before, even your best guesses at what it will take to do the project will be off by a factor of 50-100%. That’s one of the reasons that software developers have adopted various Agile methodologies, because they allow the team to adjust the project, its timelines and deliverables, based on the latest knowledge and updates. This is especially valuable when there is a high degree of innovation required, as there was for on-device generative AI, where not only are there many things you do not know when you start — there are possibly even more unknown things that you don’t know that you don’t know about..I know god forbid people be upset at software experience that was expected not show up on time… no people are never upset about that. Not at all. Why not just be serious instead of doing the opposite of reality?
This can be a difficult concept for a lot of people to grasp, especially those who live and work primarily in traditional industries, where processes and products have been produced long enough that there are few unknowns, making everything predictable. Building an airplane model that’s already been built several times? Predictable timeline and cost. Publish a weekly newsletter? Predictable timeline and cost. Build an innovative, private and secure, on-device AI? Not predictable.
In fact, given the rapid rate of innovation in the on-device, local AI world, that unpredictability is probably a good thing. Given that in the short 25-26 months that local generative AI and open-source models have existed, the field has gone from believing that it was essential to have cloud-based services hosting giant models trained on the collective knowledge of the planet, to being able to train your own models, to being able to use a pre-trained general model, to only need to “fine tune” a model. to being able to use a smaller general model with an “Adapter” (LORA), to not having to train models on, say, your documents anymore thanks to RAG, to Mixture of Experts (MOE), to using tools, and now agents. All in a rapidly shrinking memory footprint — we’re already at the point where models using these various techniques, that run easily in the unified memory of an under $4,000 Mac, can outperform the latest cloud service models on AI industry benchmarks.
At the same time, Apple has focused on providing APIs and Platform features that enable developers to make their apps’ capabilities and content available to AI to enable the features like personal context awareness and in-app actions that are key to a true personal assistant. As industry trends are currently going, a decision to move away from customized models to using open-source models would be a wise one, since, as I mentioned, the field has moved beyond the need for customized models. With the continued Apple innovation in Apple Silicon, and the shrinking of unified/VRAM memory requirements for state of the art open-source models, it’s quite likely that by the time the iPhone 20 arrives, it will have enough RAM and disk storage to have an on-device AI that rivals today’s cloud services, but completely private, and aligned only with the needs and goals of the phone’s user.