Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

One of a zillions reasons most of us see AI as "the next crypto"

Just being pushed on everyone as SV and the VCs are incredibly desperate for a "next big thing"
What I find a bit baffling is why, instead of risking their billions on AGI, these companies aren't working harder on more focused LLM-supported products and workflows. For example, rather than trying to build another version of a general model that uses Chain-of-Thought (CoT) behind the scenes (GPT-o1), why not save a buh-zillion dollars and build a UX workflow that guides the user toward CoT-based interactions, using the current/existing model? Taking this approach could give them a range of useful and reliable, revenue-generating products, without incurring huge additional costs. I suppose it's FIMO, or hubris, or both...

EDIT: Actually, I suppose they want users to do that... and I also suspect that the next leap may require a break from the "scale is everything" mentality driving the major players.... meaning that it's going to be a harder and more expensive goal to reach.
 
Last edited:
I've tried ChatGPT for a few purposes. And, as a super-aide mémoire, it can be quite useful. But it comes up with utter garbage a lot of the time.

I wanted to know about a specific medicine. It came back with companies that no longer exist. One company changed its name seven years ago. And the product they supposedly make was discontinued around twenty years ago. They miss some huge names and make big mistakes.

And a follow-up question came with a couple of references (when I pushed for them). But the articles referenced do not exist - even if you go to the journal's own website, through PubMed, etc., etc. The authors exist. The paper referenced doesn't.

One of the reasons I decided to try it is that working through all the websites is so tedious, so many have paywalls and other barriers to access, that it can takes days to check things manually. I thought it might help with some groundwork.

But I've ended up being quite sure I cannot take a single ChatGPT reply as being "true" or correct or accurate. It might tell me about something but it can take much effort to check...

Reminds me of Brandolini's law:

 
I guess now, whenever someone asks Apple why they're so far behind on AI, they can point to a research paper they wrote themselves and say "well, we were just taking our time to do it right unlike everyone else." :p
 
I've tried ChatGPT for a few purposes. And, as a super-aide mémoire, it can be quite useful. But it comes up with utter garbage a lot of the time.

I wanted to know about a specific medicine. It came back with companies that no longer exist. One company changed its name seven years ago. And the product they supposedly make was discontinued around twenty years ago. They miss some huge names and make big mistakes.

And a follow-up question came with a couple of references (when I pushed for them). But the articles referenced do not exist - even if you go to the journal's own website, through PubMed, etc., etc. The authors exist. The paper referenced doesn't.

One of the reasons I decided to try it is that working through all the websites is so tedious, so many have paywalls and other barriers to access, that it can takes days to check things manually. I thought it might help with some groundwork.

But I've ended up being quite sure I cannot take a single ChatGPT reply as being "true" or correct or accurate. It might tell me about something but it can take much effort to check...

Reminds me of Brandolini's law:

In the context of this thread, I decided to ask a much more limited question - one which is actually easy to answer using publicly accessible databases/sites.

The answers were so wrong, the entire rest of my session was spent telling it where it was wrong. Until I hit the limit:

You’ve hit the Free plan limit for GPT-4o.

But it was so, umm, smug, asserting its rubbish replies were true and accurate. Every correction I made, it then implied its replies were now true and accurate. And they were still wrong.
 
Claude solved this on first try so this news is a big nothing burger. Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

The study does not argue that AI cannot get it right: it argues that the same fundamental question formulated in different ways in some cases influences the answer significantly, leading to different formulations having potentially significantly different accuracy in the answer.

Also note that the "seemingly relevant but ultimately irrelevant information" dataset they used is only one aspect of the study, with different datasets testing the sensibility of the AI with different kinds of input variations.

I have not read the study in detail yet but they do provide quantitative results and most importantly, a detailed description of their methodology.
 
The writer seemed convinced that the AI was obsessing over him and actually asking him to leave his wife. The actual transcript for anyone who's seen this stuff back through the decades, showed the AI program bouncing off programmed parameters and being pushed by the writer into shallow territory where it lacked sufficient data to create logical interactions. The writer and most people reading it, however, thought the AI was being borderline sentient.
In the very early days of computers, there was a program called "Eliza," that pretended to be a psychologist. It was not AI even remotely, but if you answered the questions, it made it seem that way because it would parrot your answers back to you. "I'm angry," Well, how does it make you feel that you're angry?

AI has kicked this up to the next level, but it's still literally just delivering programmed responses, even if the responses are based on exponentially more data and "learning."
 
But it was so, umm, smug, asserting its rubbish replies were true and accurate. Every correction I made, it then implied its replies were now true and accurate. And they were still wrong.

I think this is one of the key aspect to keep in mind when dealing with current AIs: they tend to always give an answer with what can be perceived as "confidence", whereas in reality they have no actual clue about what they are replying.
 
Here we go, blowing the "Apple is behind on AI!" bugle like on every post. Fact of the matter is that ChatGPT and it's contemporaries are not useful for the majority of people.

It's kind of like saying Apple is behind on wearables because they don't make a ring. They don't have a ring for the same reason they don't have a chatbot - they're building something else.
 
I think this is one of the key aspect to keep in mind when dealing with current AIs: they tend to always give an answer with what can be perceived as "confidence", whereas in reality they have no actual clue about what they are replying.
EXACTLY. They don't actually know anything. They don't understand it the way they do. They can't handle nuance or feeling, because they don't feel. It's all a computation. Any human qualities like confidence, well, that's just us humans projecting those feelings onto them, or reading between the lines. With AI and LLMs, there's nothing between the lines.
 
I think this is one of the key aspect to keep in mind when dealing with current AIs: they tend to always give an answer with what can be perceived as "confidence", whereas in reality they have no actual clue about what they are replying.
And I can certainly believe that someone without my bolshiness, and my confidence in my knowledge of this tiny area, could easily have believed it.

Which is concerning because I only have that detailed knowledge about a few areas - and realise how easily I could be fooled.
 
Turns out truly intelligent, free-thinking, creative and intuitive human beings are really hard to mimic in software. Did you really need Apple to tell you that. Glad that they did burst the bubble, though.

AI as it currently stands, and is likely to stand for the foreseeable, is a useful way to solve some computing problems and nothing more. Everything else is window dressing.

A great tool that I do want to have on my phone and Mac for a very limited set of tasks which I will control.
 
Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.
Funny how 'the majority of models fail to ignore these statements' is just one observation out of hundreds in the study. Yet you cherry-picked that one to attribute bad intentions to Apple. Smells of bias. :)
 
A system that has a high level of intelligence will
a) know what it doesn't know
b) not attempt to answer a question in those domains.

These might sound bizarre but I know that I don't know how to fix my car if it breaks down. I also know that in other than truly exceptional circumstance that I shouldn't attempt to fix it but should call roadside assistance.
 
A system that has a high level of intelligence will
a) know what it doesn't know
b) not attempt to answer a question in those domains.

This is something Claude does really well. Claude does not express certainty, and it does not claim to tell the truth or know anything. It'll go as far as telling the user that they should check it's work.

Not like ChatGPT, which will just tell untruths with no hesitation.
 
Apple has to participate in AI research. It really can do useful things. But they also need to set realistic expectations to prevent the usual mania/backlash cycle. Using this as the "AI" logo should help:
Screenshot 2024-10-14 at 12.41.30 PM.png
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.
Back
Top