Siri is just a frontend. The search itself uses Bing, and Microsoft is going to marry Bing with ChatGPT, so Siri will get some benefit as well.
Apple outsourcing even their "AI"... 🤣
This (and most of these) comments are extremely silly.
An LLM (large language model) is essentially a language "extrapolator" -- given this string of text, extrapolate some successor text that is statistically plausible. Without wanting to get into issues of "what is intelligence", no-one in the know is even claiming this, *by itself*, counts as intelligence.
This sort of machinery is useful for many things (it does an adequate job of translation, or "create text in the style of"), but it is not designed to answer questions, and it's not even optimal for some of the target tasks like "summarize this text".
Consider now something like Siri (or a generic "assistant"). What it needs is
- an audio model to understand the noise it is hearing and convert it to text.
This is the first (and I suspect frequently the most common) place where things fall apart. VERY good, multiple, mics are required to get an acceptable audio signal, and such mics are not present in many situations. CarPlay (using the car mic) is one such, old macs (only one mic) is another. Having an AirPod in your ear is a 3rd case - the AirPod mic for a single AirPod seems to be lousy, which isn't helped by Apple seeming to have a bug where it can't decide whether to use the AirPod, aWatch, or iPhone mic.
- an LLM to convert the text to some sort of "understanding" of what needs to be done. IMHO Apple actually does this part reasonably well.
- an implementation of the "understanding" from the previous step. This part Apple does very badly. Some of the bad implementation is pure stupidity; things like saying "I can't do X on a Watch [or HomePod]" rather than just fscking sending the request to my iPhone or Mac, and I think this, such obvious, stupidity, is the largest single driver of Siri anger.
Another part of the bad implementation is requests for some sort of data lookup. This requires a combination of a web lookup and then perhaps some massaging of the results. This part Apple also does poorly, but I can't tell if it's the web lookup or the massaging that is the larger problem.
- finally what's needed is some sort of on-going history. Both short term memory (understand how my current sentence relates the previous sentence) and long term memory (behave like a human assistant so that you know if I usually refer to Debby in contexts about my personal life, I am referring to my wife, and you don't need to ask me, every time, if I might mean some other Debby in my address book that I last interacted with ten years ago!)
There's a lot of moving pieces here, and simply saying "ChatGPT awesome" doesn't solve most of them!
If I were Apple, apart from the "AI" (mainly people currently think LLM) parts I would work very hard on
- better audio models. There are multiple places where people still interact with bad mics, and it's worth putting more compute effort into processing that lousy signal
- better user feedback. There is far too much of things like Apple Watch for whatever reason can't talk to iPhone, or iPhone doesn't have the right internet connection or whatever, and you say your "Hey Siri, Remind me to xyz" only to be told "I can't do that right now".
Make a BETTER UI for these sorts of failure cases!!! Remember the audio data for what I told you and try again when you do have a connection? If the audio signal is too lousy to process locally, send it to some Apple computer to process. Worst case, just remember it so I can replay it as audio.
Another failure mode that people HATE HATE HATE is when you're in the car, you say "Hey Siri remind me to xyz" and Siri creates a Reminder to "kjhjkh57%%^%hjhjh", ie some garbled garbage that you have no idea what it was. There should be a way to replay the audio that generated the Reminder (or a Note) so you can figure out what you were trying to remind yourself to do.
- a better indication of what Siri thought you wanted it to do. If you look at how Wolfram Alpha handles this, you can make a natural language request, and Alpha gives its interpretation of what it thought you meant, along with multiple possible answers to different aspects of the question. Apple (and other assistants) are often too focussed on assuming there is one single response to an assistant and don't even consider giving two or four responses, and an explanation of what the assistant was trying to do.
- too much obsession with voice only. Voice has its place, but I want an Assistant not a Voice assistant! I want to be able to interact with a previous voice response using typing, or touch/pointer, or via cut and paste, or via dragging a file. Apple (and everyone else, but forget them, they are hopeless) has been locked too long in the model, which made sense in 2010 but not in 2023, of Siri as a voice statement/response system, not Siri as a high quality TOTAL assistant...