Gurman: Apple Working on On-Device LLM for Generative AI Features

a m u n · Apr 22, 2024

sbsyoel said:
Said by Apple:

Your Apple iCloud data is now stored on Google servers—surprised?

Apple also stores data with Amazon S3.

arstechnica.com

6 years ago? 🙃

Apple's Viborg, Denmark data center is operational, powered 100% by clean energy | AppleInsider

After years of construction, Apple's Viborg data center is online, and Apple has spotlighted how the data center will be 100% powered by clean energy.

appleinsider.com

jicon · Apr 22, 2024

Well, that's a way to make everyone's iPhone need an upgrade awful fast... Build processes in to the OS that require large amounts of memory.

bodhisattva · Apr 22, 2024

Time will tell. Optimistic and curious what things will look like in September...

Torty · Apr 22, 2024

jicon said:
Well, that's a way to make everyone's iPhone need an upgrade awful fast... Build processes in to the OS that require large amounts of memory.

It must be a "killer feature" to make people dig deep into their pockets und pay big money. People also remember that today's latest and greatest might be already obsolete in one year when the next latest and greatest comes out.
So maybe it's a good idea to watch how apple implements this AI features and what this actually improves.

FloatingBones · Apr 22, 2024

MacRumors said:
Gurman suggested that the company could "fill in the gaps" by licensing technology from Google and other AI service providers. Last month, Gurman reported that Apple was in discussions with Google to integrate its Gemini AI engine into the iPhone as part of iOS 18.

Google's Gemini continues to give wonky responses to trivial questions. Note these 2 responses:

Q: is gerald pollack a professor at the university of washington?

A: I do not have enough information about that person to help with your request. I am a large language model, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions, but my knowledge about this person is limited. Is there anything else I can do to help you with this request?

then

Q: is gh pollack a professor at the university of washington

A: Yes, gh Pollack, more likely referring to Gerald H. Pollack, is a professor at the University of Washington [2]. He is a professor in the Department of Bioengineering [2]. You can find more information about his research on the University of Washington website or his lab's website: https://www.pollacklab.org/.

Somewhere, Gemini has the 2 words "Gerald Pollack" wired together to generate strange responses. At the same time, Gemini is completely comfortable using his full name in its response. Craziness!

Beautyspin · Apr 22, 2024

purplerainpurplerain said:
Nano is very cutdown and probably mostly just a bunch of if else statements designed around the built in apps.

Key Features of Gemini Nano

Gemini Nano is a multimodal model, meaning it can understand and work with text, images, audio, videos, and code. It is optimized for on-device tasks, making it the most efficient model for such tasks. Gemini Nano contains two models with 1.8B parameters (Nano-1) and 3.25B parameters (Nano-2), which facilitate deployment onto devices with both high and low memory capacity.

Google`s Gemini, Ultra, Pro & Nano Version

Google has recently launched a new large language model (LLM) called Gemini, which is designed to work across various Google products…

medium.com

This is Google not Apple, which is still writing papers and buying on device AI companies four months before launching the product.

purplerainpurplerain · Apr 22, 2024

Beautyspin said:
Key Features of Gemini Nano
Gemini Nano is a multimodal model, meaning it can understand and work with text, images, audio, videos, and code. It is optimized for on-device tasks, making it the most efficient model for such tasks. Gemini Nano contains two models with 1.8B parameters (Nano-1) and 3.25B parameters (Nano-2), which facilitate deployment onto devices with both high and low memory capacity.

[

3.25b parameters for multimodal is considered tiny. The language model part of that is thus going to be considerably small and not even worth considering being called ‘large’. It’s probably a 2 bit quant too in order to reduce its footprint even further.

LLMs generally start at about 7 billion parameters for a 4 bit quant, which is considered a lot weaker and less reliable than a 16 bit quant.

Above that you get the more commonly used LLMs with 34+ billion parameters. If you tried to run one of those on a M3 Max the whole battery will be depleted in an hour. I know that because I have tried them all.

ignatius345 · Apr 22, 2024

filmantopia said:
This is a great opportunity for Apple to do what they’re best at, which is to take existing powerful technologies and develop them in a way that people can actually make use of in their lives. Generative AI is incredible, but so far its value is relatively limited for most people.

Siri already has its hooks into everything I own, so it's got a lot of potential once they de-lobotomize it.

wigby · Apr 22, 2024

macfacts said:
Apple has its users scared of their own shadow with their talk of privacy. There is no "danger" or embarrassment or shame if someone knows what color you like.

If people are using ChatGPT to choose colors, I don't they're using it right ;-)

purplerainpurplerain · Apr 22, 2024

ignatius345 said:
Siri already has its hooks into everything I own, so it's got a lot of potential once they de-lobotomize it.

It will be family friendly no matter how Siri XL or Siri Ultra turns out.

As for the competition, some of them claim to be open while not letting anyone see the source code.

I can give you a preview why right now. The official news won’t leak or be confirmed by others until later probably end of 2024.

They are experimenting with embedding trackers in generated text and images. This could be to track copyright and theft but it can also be used for tracking anonymous users who generate text in an LLM and post it somewhere anonymously like on a forum. You won’t be able to see the pattern with your naked eye. You have to run the image or text through analyzers to find it.

You can break the trackers with heavy editing and copypasta but they hope most people are too lazy to do so.

Tagbert · Apr 22, 2024

dampfnudel said:
Apple doesn’t seem to be doing in-house that well these days. Siri, their modem project, their car project, maybe a few other canned in-house projects that were never leaked or disclosed.

Sir was an acquisition and while it was useful at the time the tech turned out to be too convoluted and brittle to develop further. For a long time there was little interest in that area. That’s why Alexa and Google’s assistant all languished for years, too. It would have been good if Apple had put some effort into modernizing Siri before this, but it sounds like that is finally happening.

Don’t assume that we hear about all of the projects that Apple works on. Only the larger ones get enough exposure to leak into the rumor mill. That has been true to decades. Any larger company needs to explore new products and product lines. Most of those end up getting pruned before they are release.

dampfnudel · Apr 22, 2024

Tagbert said:
Sir was an acquisition and while it was useful at the time the tech turned out to be too convoluted and brittle to develop further. For a long time there was little interest in that area. That’s why Alexa and Google’s assistant all languished for years, too. It would have been good if Apple had put some effort into modernizing Siri before this, but it sounds like that is finally happening.

Don’t assume that we hear about all of the projects that Apple works on. Only the larger ones get enough exposure to leak into the rumor mill. That has been true to decades. Any larger company needs to explore new products and product lines. Most of those end up getting pruned before they are release.

Yes, but maybe some of these projects would’ve been really beneficial if Apple got the right people to show up at their door. Sometimes you just can’t get the talent necessary to take your company to the next level. Apparently, working at Apple has lost some of that luster.

Abazigal · Apr 22, 2024

I suspect this will end up mirroring the smart speaker market. While it may be true that smart assistants like Alexa and Google now are technically superior to Siri on paper, it ends up not really mattering to the majority of users who either don’t use those features or aren’t even aware of them in the first place.

In contrast, Apple decided to optimise Siri around a few specific use cases they decided people buying a smart speaker would likely use it for, and it wound up being good enough to hold its own.

We will probably start seeing the recent AI hype being exposed for what it really is - hype. A lot of the areas that Apple is presumably behind on will likely not end up mattering in the long run.

Beautyspin · Apr 22, 2024

purplerainpurplerain said:
3.25b parameters for multimodal is considered tiny. The language model part of that is thus going to be considerably small and not even worth considering being called ‘large’. It’s probably a 2 bit quant too in order to reduce its footprint even further.

LLMs generally start at about 7 billion parameters for a 4 bit quant, which is considered a lot weaker and less reliable than a 16 bit quant.

Above that you get the more commonly used LLMs with 34+ billion parameters. If you tried to run one of those on a M3 Max the whole battery will be depleted in an hour. I know that because I have tried them all.

Our most efficient model, designed to run on-device. We trained two versions of Nano, with 1.8B (Nano-1) and 3.25B (Nano-2) parameters, targeting low and high memory devices respectively. It is trained by distilling from larger Gemini models. It is 4-bit quantized for deployment and provides best-in-class performance.

Gemini Nano is distilled down from the larger Gemini models and specifically optimized to run on mobile silicon accelerators. Gemini Nano enables powerful capabilities such as high-quality text summarization, contextual smart replies, and advanced proofreading and grammar correction. For example, the enhanced language understanding of Gemini Nano enables the Pixel 8 Pro to concisely summarize content in the Recorder app, even when the phone’s network connection is offline.

Gemini Nano is starting to power Smart Reply in Gboard on Pixel 8 Pro, ready to be enabled in settings as a developer preview. Support in Android is rolling out for WhatsApp, Line, and KakaoTalk over the next few weeks with more messaging apps in the new year. The on-device AI model saves you time by suggesting high-quality responses with conversational awareness.

What more can you expect from an on-device LLM that Apple will be using?

ipedro · Apr 22, 2024

Jumpthesnark said:
All that's needed is for Apple to make the asks anonymous. Because in the scenario you describe it all sounds great except for the inevitable Google-sends-me-ads-for-that-wine, the way it is now.

If it can work the way you say (offloading to the internet only the most necessary search requests, all the rest of the processing is done on device), that would be great.

Google is moving away from Cookies and an exact user targeting model and instead building user types that can be targeted very precisely, without knowing who you are at all.

People believe they're unique, that they're special. There are only so many personality types and that number isn't very large. It's why you're able to meet someone and become best buds or feel that they're your soulmate. If the number were too large, you'd never meet anyone with so much in common, or maybe once or twice in your life. But it happens many times over your life. We are very very predictable.

This is also (partially) why people get the impression that "Google is listening" or "Facebook heard what I said and is now showing me ads", even if you didn't search something online, only talked about it or crazily enough, only thought about it. It's not that they were listening to you, it's that several personality dopplegangers in the same or similar geography and media environment were influenced to think about the same things or want the same thing. Observer bias is another part of that, but that's another conversation entirely.

All this to say, I'm confident that Apple can preserve privacy, but Google will want data, even if it's anonymized. Google is in the business of data, of selling advertisers access to users' eyeballs, and they're not licensing Gemini to Apple without it. And Apple in turn, is not giving up its users privacy. I see the intersection where both of those things are possible.

purplerainpurplerain · Apr 22, 2024

Beautyspin said:
What more can you expect from an on-device LLM that Apple will be using?

Personally I don’t give AF and will keep it disabled because I have a brain and I like using it.

Dreadpirateflappy · Apr 23, 2024

Rogifan said:
If it’s on device how much will work with existing devices? And how much can they announce at WWDC if it requires hardware that hasn’t been announced yet?

Some of Galaxy AI is "on device" and uses processing power and older devices could get it, but only devices a few years old as otherwise it doesn't have the power. so there is hope Apple will do the same.

ipedro · Apr 23, 2024

Jumpthesnark said:
My second concern, as some others have mentioned, is how much of a database are we seriously supposed to have resident on our phone? Because the "large" part of an LLM means we'd need a lot more storage if we're carrying that around with us.

The Large Language Model refers to the training model, not the size of the files on the processing server or on device as will be the case with Siri.

Modern generative transformers are trained on over a hundred billion parameters, which is where the large comes from. It doesn't need to store all those parameters once it's made connections between words, in other words "learned". That's what the "P" in GPT stands for: pre-trained.

Jumpthesnark · Apr 23, 2024

ipedro said:
The Large Language Model refers to the training model, not the size of the files on the processing server or on device as will be the case with Siri.

Modern generative transformers are trained on over a hundred billion parameters, which is where the large comes from. It doesn't need to store all those parameters once it's made connections between words, in other words "learned". That's what the "P" in GPT stands for: pre-trained.

Okay, perhaps my attempt at a pithy comment distracted from the actual question. Apologies if I threw you a curve.

To restate my concern: the item indicates the database itself will be resident on our devices. So the key question still stands - how large would the LLM-trained database resident on our devices be? As the item notes, Gurman says "it will run entirely on-device, rather than via the cloud like most existing AI services."

All of that is data we would have on our devices that we don't now. All those ones and zeroes add up and they require storage. Does that mean less room for apps and our own data/photos/music/etc.? Would devices need to come with more standard memory, from the base models on up?

deconstruct60 · Apr 23, 2024

a m u n said:
6 years ago? 🙃

Apple's Viborg, Denmark data center is operational, powered 100% by clean energy | AppleInsider

After years of construction, Apple's Viborg data center is online, and Apple has spotlighted how the data center will be 100% powered by clean energy.

appleinsider.com

Apple runs a wide array of web services. The 'scale out' parts do not necessarily run in Apple data centers. Keeping track of people's AppleID accounts (passwords, credit cards , billing , money , etc) , Messages connection brokering. email, are all substantive tasks when the user base in the the 100's of millions.

For example, Apple's "Private Relay" system does not run solely on Apple servers.

" ...
As mentioned above, Cloudflare functions as a second relay in the iCloud Private Relay system. We’re well suited to the task — Cloudflare operates one of the largest, fastest networks in the world. Our infrastructure makes sure traffic reaches every network in the world quickly and reliably, no matter where in the world a user is connecting from.
..."

iCloud Private Relay: What Cloudflare Customers Need to Know

iCloud Private Relay is an Apple service for browsing more privately and securely. Learn how to ensure the best user experience using iCloud Private Relay.

blog.cloudflare.com

Apple has a billion devices pinging on servers around the world. The Apple owned data centers are a drop in the bucket of what is pragmatically needed. Apple is not a major hyperscaler player. Relatively there number of data centers are not 'prime time player' status.

Even if Apple kept the primary copies of folks iCloud data where do the 'offsite' backups/replications go? 500 millions users at 5GB a piece is 25M GBs of data. Then you need another 25M GB of space at a different location to keep the backup.
Add another 500 million users and add another doubling of 25M GB of space.

Throw on top, requirements to deal with varying international privacy laws , China , EU etc where need to 'offsite backlog' in each rule silo reach so even more offsite backups. The structure that Apple has built has major portions that rely on outsourcing the workload to 3rd party data centers. The exclusive core they are keeping for themselves goes up but if the user base workload goes up also they are just treading water there... not dropping the 3rd parties completely.

Most of the largest hyperscalars are already on ARm servers. Cloudflare is. Amazon is. Google/Facebook/Azure are ramping. etc etc. Apple doesn't need to move to their own centers to 'save power' or 'feel good' issues.

ipedro · Apr 23, 2024

Jumpthesnark said:
Okay, perhaps my attempt at a pithy comment distracted from the actual question. Apologies if I threw you a curve.

To restate my concern: the item indicates the database itself will be resident on our devices. So the key question still stands - how large would the LLM-trained database resident on our devices be? As the item notes, Gurman says "it will run entirely on-device, rather than via the cloud like most existing AI services."

All of that is data we would have on our devices that we don't now. All those ones and zeroes add up and they require storage. Does that mean less room for apps and our own data/photos/music/etc.? Would devices need to come with more standard memory, from the base models on up?

It could be fairly small. These aren't pre-written questions and answers to every possible question. Instead, they're a (long) list of words and a probability score attributed to words being after other words in the context of other words and in response to input.

The generative processing will "run entirely on device", ensuring privacy and saving round trip time, which doesn't mean that it'll be siloed from the internet. Just like on-device Siri today is processed locally, it can still search the internet for you –– and it does, to the chagrin of users: "I found this on the web for you".

That said, it should be far better at this, particularly if paired with Gemini. Siri will answer local questions, what you're looking at on your screen, what's in your Apple ecosystem (photos, calendar, contacts, etc) and third party apps on your iPhone and will pass the baton to Gemini to answer questions to current events or frequently updated information that require internet access, and will pass back the answer to Siri to read, rather than showing you a website.

How Siri's local processing vs online access may work, as detailed in this post:

ipedro said:
An offline LLM means that Siri itself will be processed on device as it has been doing in a limited way for a couple of iOS generations now, but it can still make use of the web to fetch requests. It doesn't need to send your voice to the cloud to process, saving round trip time and preserving privacy.

A recent machine learning model Apple published with the capability to recognize apps' UI and how to use them, gives us a good indication of how this will work. Siri (offline) will be able to act on your behalf to use apps (online).

User:
Siri, I'd like to have dinner with my girlfriend tonight at that place I walked by last week with the red umbrella. I took a photo of it.

Siri:

Code:

Looks at your Photos (offline), finds the red umbrella, geotagged, locates the restaurant in Apple Maps (online), brings up OpenTable (online) cues up a reservation after looking at your Calendar (offline).

That was Sandro's on College St. I've found you a reservation for 2 at 8pm. You get off work at 5, that should give you enough time to get home, get ready and head over to Sandro's. Ana's schedule also shows her free. Would you like me to book it?

User:
Yes... no, wait, can we do 8:30 instead? I'd like to get a bottle of wine, does Sandro's have a corking fee?

Siri:

Code:

Looks up OpenTable (online) to see if there's an 8:30pm reservation. Looks up Sandro's website (online) and searches for a corking fee. Looks up Apple Maps for nearby wine stores, finds one that's on the Ritual app (online) so they can can bag your wine for pickup, finds that you've ordered 2 different wines via Ritual.

Sandro's has a $6 corking fee. I found you an 8:30 reservation. Would you like Mateus Rose or Wolf Blass - Yellow Label Sauvignon? I can reserve it at the Wine Cellar, a short walk from Sandro's.

User:
Let's do Mateus. Go ahead and book the reservation please.

Siri:

Code:

Goes to OpenTable, places the reservation on your behalf. Goes to Ritual, orders a bottle of Mateus Rose for pickup at 8pm. Adds an event in your calendar with directions to the Wine Cellar and another at 8:30pm with directions from there to Sandro's. Creates a calendar invite for your girlfriend.

All set! Your Mateus Rose will be ready for pickup at 8pm, Sandro's at 8:30 on College St. and I've sent an invite to Ana.

Local Siri processing without having to access the internet will enable free flowing conversations without a delay. Having the ability to recognize how to use apps on your phone will be the online component. You already use those apps online. I suspect Google will be one of them, to allow Siri to get current information from the internet, using Gemini and returning those answers the same way it can return search results today, but with the capability to make use of them to get you an answer and read them back to you.

Apple's advantage beyond building silicon custom made for its native Siri, is that it has the largest App Store with virtually unlimited potential (there's an app for everything). Give Siri the capability to understand how to use apps (like the model Apple just published) and you can imagine how far this can go.

Tagbert · Apr 28, 2024

Carrotstick said:
If Apple truly cared about privacy, their cloud AI would also be in-house.

This just shows that Apple is behind in cloud based AI

iCloud is end to end encrypted. The hosting provider is not really an issue.

Tolga Balci · Apr 28, 2024

MacRumors said:
Apple is developing its own large language model (LLM) that runs on-device to prioritize speed and privacy, Bloomberg's Mark Gurman reports.

Writing in his "Power On" newsletter, Gurman said that Apple's LLM underpins upcoming generative AI features. "All indications" apparently suggests that it will run entirely on-device, rather than via the cloud like most existing AI services.

Since they will run on-device, Apple's AI tools may be less capable in certain instances than its direct cloud-based rivals, but Gurman suggested that the company could "fill in the gaps" by licensing technology from Google and other AI service providers. Last month, Gurman reported that Apple was in discussions with Google to integrate its Gemini AI engine into the iPhone as part of iOS 18. The main advantages of on-device processing will be quicker response times and superior privacy compared to cloud-based solutions.

Apple's marketing strategy for its AI technology will apparently be based around how it can be useful to users' daily lives, rather than its power. Apple's broader AI strategy is expected to be revealed alongside previews of its major software updates at WWDC in June.

Article Link: Gurman: Apple Working on On-Device LLM for Generative AI Features

I hope the part about using Gemini (When the on-device AI is not sufficient) is incorrect. Even Gemini 1.5 Pro is consistently the worst of the models in many comparisons and doesn't come close to GPT 4 Turbo.

Gurman: Apple Working on On-Device LLM for Generative AI Features

macrumors regular

macrumors 6502a

macrumors 6502

macrumors 65816

macrumors 68000

macrumors 65816

Key Features of Gemini Nano​

Suspended

Key Features of Gemini Nano​

macrumors G3

macrumors 68030

Suspended

macrumors 604

macrumors 603

Contributor

macrumors 65816

macrumors 604

Suspended

macrumors newbie

macrumors 604

macrumors 68000

macrumors G5

macrumors 604

macrumors 604

macrumors newbie

Our Staff

Key Features of Gemini Nano

Key Features of Gemini Nano