Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

Attirex · Oct 14, 2024

Nebraska and Taylor Swift are AI.

buckwheet · Oct 14, 2024

turbineseaplane said:
One of a zillions reasons most of us see AI as "the next crypto"

Just being pushed on everyone as SV and the VCs are incredibly desperate for a "next big thing"

What I find a bit baffling is why, instead of risking their billions on AGI, these companies aren't working harder on more focused LLM-supported products and workflows. For example, rather than trying to build another version of a general model that uses Chain-of-Thought (CoT) behind the scenes (GPT-o1), why not save a buh-zillion dollars and build a UX workflow that guides the user toward CoT-based interactions, using the current/existing model? Taking this approach could give them a range of useful and reliable, revenue-generating products, without incurring huge additional costs. I suppose it's FIMO, or hubris, or both...

EDIT: Actually, I suppose they want users to do that... and I also suspect that the next leap may require a break from the "scale is everything" mentality driving the major players.... meaning that it's going to be a harder and more expensive goal to reach.

polyphenol · Oct 14, 2024

I've tried ChatGPT for a few purposes. And, as a super-aide mémoire, it can be quite useful. But it comes up with utter garbage a lot of the time.

I wanted to know about a specific medicine. It came back with companies that no longer exist. One company changed its name seven years ago. And the product they supposedly make was discontinued around twenty years ago. They miss some huge names and make big mistakes.

And a follow-up question came with a couple of references (when I pushed for them). But the articles referenced do not exist - even if you go to the journal's own website, through PubMed, etc., etc. The authors exist. The paper referenced doesn't.

One of the reasons I decided to try it is that working through all the websites is so tedious, so many have paywalls and other barriers to access, that it can takes days to check things manually. I thought it might help with some groundwork.

But I've ended up being quite sure I cannot take a single ChatGPT reply as being "true" or correct or accurate. It might tell me about something but it can take much effort to check...

Reminds me of Brandolini's law:

Brandolini's law - Wikipedia

en.m.wikipedia.org

JosephAW · Oct 14, 2024

Translation: it’s all smoke and mirrors. The name says it all, artificial intelligence.

coffeemilktea · Oct 14, 2024

I guess now, whenever someone asks Apple why they're so far behind on AI, they can point to a research paper they wrote themselves and say "well, we were just taking our time to do it right unlike everyone else."

Sippincider · Oct 14, 2024

Photoshopper said:
Why has no one else reported this? It took the “newcomer” Apple to figure it out and to tell the truth?

Apple's justification for being behind? "Look! See! AI isn't ready yet!"

(After having dawdled with Siri for years, missing their own chance to lead the industry...)

polyphenol · Oct 14, 2024

polyphenol said:
I've tried ChatGPT for a few purposes. And, as a super-aide mémoire, it can be quite useful. But it comes up with utter garbage a lot of the time.

I wanted to know about a specific medicine. It came back with companies that no longer exist. One company changed its name seven years ago. And the product they supposedly make was discontinued around twenty years ago. They miss some huge names and make big mistakes.

And a follow-up question came with a couple of references (when I pushed for them). But the articles referenced do not exist - even if you go to the journal's own website, through PubMed, etc., etc. The authors exist. The paper referenced doesn't.

One of the reasons I decided to try it is that working through all the websites is so tedious, so many have paywalls and other barriers to access, that it can takes days to check things manually. I thought it might help with some groundwork.

But I've ended up being quite sure I cannot take a single ChatGPT reply as being "true" or correct or accurate. It might tell me about something but it can take much effort to check...

Reminds me of Brandolini's law:

Brandolini's law - Wikipedia

en.m.wikipedia.org

In the context of this thread, I decided to ask a much more limited question - one which is actually easy to answer using publicly accessible databases/sites.

The answers were so wrong, the entire rest of my session was spent telling it where it was wrong. Until I hit the limit:

You’ve hit the Free plan limit for GPT-4o.

But it was so, umm, smug, asserting its rubbish replies were true and accurate. Every correction I made, it then implied its replies were now true and accurate. And they were still wrong.

bsolar · Oct 14, 2024

loui100 said:
Claude solved this on first try so this news is a big nothing burger. Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

The study does not argue that AI cannot get it right: it argues that the same fundamental question formulated in different ways in some cases influences the answer significantly, leading to different formulations having potentially significantly different accuracy in the answer.

Also note that the "seemingly relevant but ultimately irrelevant information" dataset they used is only one aspect of the study, with different datasets testing the sensibility of the AI with different kinds of input variations.

I have not read the study in detail yet but they do provide quantitative results and most importantly, a detailed description of their methodology.

CopyChief · Oct 14, 2024

applezulu said:
The writer seemed convinced that the AI was obsessing over him and actually asking him to leave his wife. The actual transcript for anyone who's seen this stuff back through the decades, showed the AI program bouncing off programmed parameters and being pushed by the writer into shallow territory where it lacked sufficient data to create logical interactions. The writer and most people reading it, however, thought the AI was being borderline sentient.

In the very early days of computers, there was a program called "Eliza," that pretended to be a psychologist. It was not AI even remotely, but if you answered the questions, it made it seem that way because it would parrot your answers back to you. "I'm angry," Well, how does it make you feel that you're angry?

AI has kicked this up to the next level, but it's still literally just delivering programmed responses, even if the responses are based on exponentially more data and "learning."

bsolar · Oct 14, 2024

polyphenol said:
But it was so, umm, smug, asserting its rubbish replies were true and accurate. Every correction I made, it then implied its replies were now true and accurate. And they were still wrong.

I think this is one of the key aspect to keep in mind when dealing with current AIs: they tend to always give an answer with what can be perceived as "confidence", whereas in reality they have no actual clue about what they are replying.

ProbablyDylan · Oct 14, 2024

Here we go, blowing the "Apple is behind on AI!" bugle like on every post. Fact of the matter is that ChatGPT and it's contemporaries are not useful for the majority of people.

It's kind of like saying Apple is behind on wearables because they don't make a ring. They don't have a ring for the same reason they don't have a chatbot - they're building something else.

CopyChief · Oct 14, 2024

bsolar said:
I think this is one of the key aspect to keep in mind when dealing with current AIs: they tend to always give an answer with what can be perceived as "confidence", whereas in reality they have no actual clue about what they are replying.

EXACTLY. They don't actually know anything. They don't understand it the way they do. They can't handle nuance or feeling, because they don't feel. It's all a computation. Any human qualities like confidence, well, that's just us humans projecting those feelings onto them, or reading between the lines. With AI and LLMs, there's nothing between the lines.

polyphenol · Oct 14, 2024

bsolar said:
I think this is one of the key aspect to keep in mind when dealing with current AIs: they tend to always give an answer with what can be perceived as "confidence", whereas in reality they have no actual clue about what they are replying.

And I can certainly believe that someone without my bolshiness, and my confidence in my knowledge of this tiny area, could easily have believed it.

Which is concerning because I only have that detailed knowledge about a few areas - and realise how easily I could be fooled.

bergert · Oct 14, 2024

War Games (1983) anyone?

WarGames (1983) ⭐ 7.1 | Action, Drama, Sci-Fi

1h 54m | PG

www.imdb.com

JimmyHook · Oct 14, 2024

Large Language Models do not use “reason” to navigate problems. Anyone saying they do fundamentally misunderstands them or is trying to inflate their company’s stock price through hype.

now i see it · Oct 14, 2024

Say the purveyors of Siri

Will Co · Oct 14, 2024

Turns out truly intelligent, free-thinking, creative and intuitive human beings are really hard to mimic in software. Did you really need Apple to tell you that. Glad that they did burst the bubble, though.

AI as it currently stands, and is likely to stand for the foreseeable, is a useful way to solve some computing problems and nothing more. Everything else is window dressing.

A great tool that I do want to have on my phone and Mac for a very limited set of tasks which I will control.

LeeW · Oct 14, 2024

Photoshopper said:
Why has no one else reported this? It took the “newcomer” Apple to figure it out and to tell the truth?

The newcomer is last to 'figure it out'.

Yoshimura · Oct 14, 2024

loui100 said:
Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

Funny how 'the majority of models fail to ignore these statements' is just one observation out of hundreds in the study. Yet you cherry-picked that one to attribute bad intentions to Apple. Smells of bias.

AtheistP3ace · Oct 14, 2024

Alternative intelligence

ItWasNotMe · Oct 14, 2024

A system that has a high level of intelligence will
a) know what it doesn't know
b) not attempt to answer a question in those domains.

These might sound bizarre but I know that I don't know how to fix my car if it breaks down. I also know that in other than truly exceptional circumstance that I shouldn't attempt to fix it but should call roadside assistance.

ellpee · Oct 14, 2024

I wish Apple had stayed out of the AI game. My guess is the CEO insisted they had to get involved for marketing purposes.

ProbablyDylan · Oct 14, 2024

ItWasNotMe said:
A system that has a high level of intelligence will
a) know what it doesn't know
b) not attempt to answer a question in those domains.

This is something Claude does really well. Claude does not express certainty, and it does not claim to tell the truth or know anything. It'll go as far as telling the user that they should check it's work.

Not like ChatGPT, which will just tell untruths with no hesitation.

Algr · Oct 14, 2024

Apple has to participate in AI research. It really can do useful things. But they also need to set realistic expectations to prevent the usual mania/backlash cycle. Using this as the "AI" logo should help:

Screenshot 2024-10-14 at 12.41.30 PM.png

turbineseaplane · Oct 14, 2024

Sippincider said:
Apple's justification for being behind?

"Here's what I found for you on the web"
😂

Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

macrumors 65816

macrumors 6502

macrumors 68020

macrumors 604

macrumors 68000

macrumors 6502

macrumors 68020

macrumors 68000

macrumors 6502

macrumors 68000

macrumors 68030

macrumors 6502

macrumors 68020

macrumors 6502

macrumors 65816

macrumors G4

macrumors 6502a

macrumors 601

macrumors regular

macrumors 6502a

macrumors 6502

Cancelled

macrumors 68030

macrumors 6502a

Contributor

Our Staff