Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

GrayFlannel · Oct 14, 2024

DrRadon said:
ChatGPT was outright lieing to me so many times it’s bonkers.

How were they lying to you?

atomic.flip · Oct 14, 2024

The general model design is fundamentally sound but to get the level of logical reasoning possible in a human being you’d need a much higher resolution pattern recognition and preferably based on much smaller portions of quantized data. That’s generally the distinction…. Not accounting for potential quantum computing that may occur in carbon nano tubules in the brain. For that we simply cannot do with anything less than quantum computing on silicon. Explaining that would require a bit more text. LOL

GrayFlannel · Oct 14, 2024

ItWasNotMe said:
I tried ChatGPT with the following question "why is 9 time 6 equals 42 a problem" and got the following result

"Saying that 9 times 6 equals 42 is a problem because it’s incorrect. The correct product of 9 and 6 is 54. Miscalculating can lead to errors in various contexts, such as math problems, financial calculations, or measurements. Accuracy is key in mathematics!:

When it says it's incorrect, that's not necessarily true, same for the answer being 54, it's not necessarily true. It's correct when it says it accuracy is key but so is completeness.

It made a fundamental assumption that I had fully specified the problem and was working in decimal, which was why it unequivocally said it was incorrect.

I wasn't I was using base 13 (where 13 is in decimal notation)

Decimal version: 9 (Decimal) X 6 (Decimal) = 54 (Decimal) , i.e. 5 X base + 4 -> 5X10 +4

Base 13 version: 9 (Base 13) X 6 (Base 13) = 42 (Base 13), i.e. 4 X base + 2 -> 4 X 13 +2

That the ultimate question is not expressed in decimal is why the universe is a strange and wondrous place.

Some assumptions have to be made. There is a duty on the questioner.

Specifics, Bob.

falkon-engine · Oct 14, 2024

Turns out emulating the human brain is more complicated than originally thought. Who’d’ve thunk it.

Yoshimura · Oct 14, 2024

MacRumors said:
One example given in the paper involves a simple math problem asking how many kiwis a person collected over several days. When irrelevant details about the size of some kiwis were introduced, models such as OpenAI's o1 and Meta's Llama incorrectly adjusted the final total, despite the extra information having no bearing on the solution.

If you take the time to give me details about the size of some kiwis, I might also discard them from the total. And the reason for this is that I’m reasoning!

I’d be really curious to see what the model would reply if it had been asked why it subtracted them from the total.

wigby · Oct 14, 2024

loui100 said:
Claude solved this on first try so this news is a big nothing burger. Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

Actually ChatGPT 4o also solved it on first go, so what the hell? I actually ran this multiple times with both models and it never came out wrong. Did they run the test 10000 times until the AI tripped?
View attachment 2437141

Now ask it "are you sure about this answer?" and it will likely change it's response. That's what it often does for me.

applezulu · Oct 14, 2024

CopyChief said:
In the very early days of computers, there was a program called "Eliza," that pretended to be a psychologist. It was not AI even remotely, but if you answered the questions, it made it seem that way because it would parrot your answers back to you. "I'm angry," Well, how does it make you feel that you're angry?

AI has kicked this up to the next level, but it's still literally just delivering programmed responses, even if the responses are based on exponentially more data and "learning."

That's exactly what I was referencing. If you've ever played with that program or one similar and you read the transcript in the NY Times article from a couple of years ago, you'd see the latter as an updated version of the former. The only ghost in the machine is the person who wrote the program that sets the bot's parameters.

The innovations are the sheer size of the source database and the ability to receive queries and respond to them using realistic language. The rest is a search engine and the use of probability calculations to generate a seemingly plausible response to the query.

b17777 · Oct 14, 2024

loui100 said:
Claude solved this on first try so this news is a big nothing burger. Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

Actually ChatGPT 4o also solved it on first go, so what the hell? I actually ran this multiple times with both models and it never came out wrong. Did they run the test 10000 times until the AI tripped?
View attachment 2437141

Maybe they DID run the test 10000 times and now it knows the answer ?

cjsuk · Oct 14, 2024

loui100 said:
Claude solved this on first try so this news is a big nothing burger. Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

Actually ChatGPT 4o also solved it on first go, so what the hell? I actually ran this multiple times with both models and it never came out wrong. Did they run the test 10000 times until the AI tripped?
View attachment 2437141

I think you are missing the thing here.

The models produced a correct answer through statistical means. They did not solve it.

Scotticus · Oct 14, 2024

I mean... none of this is news. LLMs have always been and will always be highly sophisticated, power-intensive autocompletes. The only people who think otherwise are the uninformed and the people who stand to gain from duping the public.

CopyChief · Oct 14, 2024

DrRadon said:
This is not the first time someone noticed something and wrote about it. ChatGPT was outright lieing to me so many times it’s bonkers.

Lying to you? No it’s not. That suggests intent. It has no intent other than to respond to your prompt. It might be wrong. It might hallucinate. It might make connections that aren’t there or conflate two concepts with similar words. But it’s not lying. It can’t.

Yoshimura · Oct 14, 2024

Yoshimura said:
If you take the time to give me details about the size of some kiwis, I might also discard them from the total. And the reason for this is that I’m reasoning!

I’d be really curious to see what the model would reply if it had been asked why it subtracted them from the total.

I asked ChatGPT...

"If an AI model removed the 5 smaller kiwis from the total, it might have misinterpreted the phrase "five of them were a bit smaller than average" as a condition to exclude them. The model could have inferred that since the kiwis were described differently, they should be treated separately or excluded, even though the problem doesn’t explicitly state that the smaller kiwis should be removed from the total count."

Exactly what I could have done!

cjsuk · Oct 14, 2024

I think the best approach to start reasoning about this about is with Arthur C Clarke: "any sufficiently advanced technology is indistinguishable from magic". That is not adequate for explaining human behaviour so I pose an alternative quote: "any poorly educated society is incapable of reasoning about the claims that arrive with technology".

Alas through some irony, people who believe the claims probably can be replaced by this technology.

What a quagmire! Perhaps we should use it to generate content for education so we can improve society to a point where it doesn't think this is credible magic. Oh no, wait, it fails at that too! This fell out the other day...

The clever people rode NVDA up to the sky, laughing at the investors who are going to pay for their gains later.

Chuckeee · Oct 14, 2024

DownUnderDan said:
executive team

I guess that implies something about the true utility of “executive team”

wilhoitm · Oct 14, 2024

AI does not need to be self aware to jack you up royally! So are we missing the point?

Algr · Oct 14, 2024

wilhoitm said:
AI does not need to be self aware to jack you up royally! So are we missing the point?

There might be more than one point. Life is pointy.

falainber · Oct 14, 2024

ItWasNotMe said:
I tried ChatGPT with the following question "why is 9 time 6 equals 42 a problem" and got the following result

"Saying that 9 times 6 equals 42 is a problem because it’s incorrect. The correct product of 9 and 6 is 54. Miscalculating can lead to errors in various contexts, such as math problems, financial calculations, or measurements. Accuracy is key in mathematics!:

When it says it's incorrect, that's not necessarily true, same for the answer being 54, it's not necessarily true. It's correct when it says it accuracy is key but so is completeness.

It made a fundamental assumption that I had fully specified the problem and was working in decimal, which was why it unequivocally said it was incorrect.

I wasn't I was using base 13 (where 13 is in decimal notation)

Decimal version: 9 (Decimal) X 6 (Decimal) = 54 (Decimal) , i.e. 5 X base + 4 -> 5X10 +4

Base 13 version: 9 (Base 13) X 6 (Base 13) = 42 (Base 13), i.e. 4 X base + 2 -> 4 X 13 +2

That the ultimate question is not expressed in decimal is why the universe is a strange and wondrous place.

Obviously, in this case ChatGPT used better reasoning than you. By default, everyone uses base 10. If you want to get correct unswer, you need to use correct (properly specified) prompt which you did not.

AndiG · Oct 14, 2024

OMG, Apple!!! Research? Study?

I would consider this common knowledge. There is no reasoning in LLMs - the latest ChatGpt o1-preview tries to work in this direction. But an LLM constructs sentences from statistical data (very rough explanation). Neither does an AI/LLM understand you, nor does it think about what you said - it just generates text. Btw an LLM doesn‘t wake up in the middle of the night, repeating the questions and answers of the day, asking itself if the given answers were correct.

With the insane amount of training data the LLMs create surprisingly correct output. But depending on the model, no one guarantees that the generated stuff is correct.

Btw - if you ask a human, the output can be garbage as well.

ItWasNotMe · Oct 14, 2024

falainber said:
Obviously, in this case ChatGPT used better reasoning than you. By default, everyone uses base 10. If you want to get correct unswer, you need to use correct (properly specified) prompt which you did not.

I also asked it 'When does 9x6 = 42' - it's reply started "Nine times six equals fifty-four, not forty-two"

Not a good answer... Correct would be in Base 13

It then referenced the Hitchhikers Guide, so it clearly knows much more about literature than arithmetic...

Agent007 · Oct 14, 2024

AndiG · Oct 14, 2024

*** Breaking ***

Apple found out, Meta AI generates fascinating movies

genovelle · Oct 14, 2024

Apple is bringing this to your attention to lower expectations for these models, but to also mention the solution. Neural network integration. Guess who has 2 billion devices powered by neural networks? Hint hint, they will have an event talking about it in June of next year and it ain’t Google.

ThailandToo · Oct 14, 2024

I feel like Apple just threw together Apple Intelligence as they needed to say they were competing. iPhones still don’t have the whole feature they were supposedly “designed for Apple Intelligence.” Marketing at its worst.

Reality is Apple should have been on top of AI for years, should have more advanced models rather than outsource the data to ChatGPT. Who actually believes Apple cares about your privacy can realize they don’t when using third parties for screening your calls and AI.

Maybe if they hadn’t been spending Billions on cars and Vision Pros they could actually compete with their own solution.

lkrupp · Oct 14, 2024

Timpetus said:
If this surprises you, you've been lied to. Next, figure out why they wanted you to think "AI" was actually thinking in a way qualitatively similar to humans. Was it just for money? Was it to scare you and make you easier to control?

Just for money, that’s all. AI has been hyped as the next big thing by tech corporations. We’ve been “educated” that without AI we will fall behind. The push is on for everybody to use AI and it means more profit. Just look at how Apple was vilified and dismissed for not deploying AI immediately with every other tech Tom/Dick/Harry who jumped on the band wagon. The technology industry is always on the hunt for the next big thing to keep the coin rolling in.

tennisproha · Oct 14, 2024

To be fair, Siri has no reasoning abilities whatsoever...

Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

Suspended

macrumors 6502a

Suspended

macrumors 65816

macrumors regular

macrumors 68030

macrumors 6502

macrumors regular

macrumors 68000

Suspended

macrumors 6502

macrumors regular

macrumors 68000

macrumors 68040

macrumors 6502a

macrumors 6502a

macrumors 68040

macrumors 65816

macrumors 6502

macrumors 6502

macrumors 65816

macrumors 68020

macrumors 65816

Suspended

macrumors 68020

Our Staff