Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

metalsquid · Oct 15, 2024

ItWasNotMe said:
I tried ChatGPT with the following question "why is 9 time 6 equals 42 a problem" and got the following result

"Saying that 9 times 6 equals 42 is a problem because it’s incorrect. The correct product of 9 and 6 is 54. Miscalculating can lead to errors in various contexts, such as math problems, financial calculations, or measurements. Accuracy is key in mathematics!:

When it says it's incorrect, that's not necessarily true, same for the answer being 54, it's not necessarily true. It's correct when it says it accuracy is key but so is completeness.

It made a fundamental assumption that I had fully specified the problem and was working in decimal, which was why it unequivocally said it was incorrect.

I wasn't I was using base 13 (where 13 is in decimal notation)

Decimal version: 9 (Decimal) X 6 (Decimal) = 54 (Decimal) , i.e. 5 X base + 4 -> 5X10 +4

Base 13 version: 9 (Base 13) X 6 (Base 13) = 42 (Base 13), i.e. 4 X base + 2 -> 4 X 13 +2

That the ultimate question is not expressed in decimal is why the universe is a strange and wondrous place.

it gets there in the end. here I try to speak to it how I would to a human when trying to get them to reason something out without being too explicit.

I was actually surprised when it came back with the base-13 response after my "therefore..." line. ChatGPT was actually able to figure out what I was getting at though I never told it directly to consider non-decimal base systems. Yes, I did mention decimal, which may have been just enough to nudge it closer toward the response I wanted but to be fair there are a lot of humans for which I think that would've been too subtle of a hint to "get it" without further prodding.

anyway sorry if it's a bit ott i just thought it was an interesting problem and i was curious to dig a little deeper

it may just be advanced pattern-matching but it impresses me it somehow "knew" to consider base-13 without explicitly being corrected to do so. Is that not indicative of some level of higher level inference and not simply some gigantic table lookup?

turbineseaplane · Oct 15, 2024

I think AI came up with the iPad Mini 7 "update"

Only thing that makes sense

ItWasNotMe · Oct 15, 2024

metalsquid said:
it gets there in the end. here I try to speak to it how I would to a human when trying to get them to reason something out without being too explicit.

View attachment 2437611

View attachment 2437612

I was actually surprised when it came back with the base-13 response after my "therefore..." line. ChatGPT was actually able to figure out what I was getting at though I never told it directly to consider non-decimal base systems. Yes, I did mention decimal, which may have been just enough to nudge it closer toward the response I wanted but to be fair there are a lot of humans for which I think that would've been too subtle of a hint to "get it" without further prodding.

anyway sorry if it's a bit ott i just thought it was an interesting problem and i was curious to dig a little deeper it may just be advanced pattern-matching but it impresses me it somehow "knew" to consider base-13 without explicitly being corrected to do so. Is that not indicative of some level of higher level inference and not simply some gigantic table lookup?

It depends if/how it processes what it calls 'nudges'. I've already give it that solution yesterday.

I just tried a different tack 'When is 9 times 6 equals 36 not a problem' - A solution would be if you were using hexadecimal, i.e. the answer is (3*16)+6

Its response - My bolding of the answer
"The statement "9 times 6 equals 36" is always true mathematically. However, if you're referring to a situation where it wouldn't be considered a problem, it could be in a context where the multiplication is obvious or straightforward, like in a math class where everyone understands basic multiplication. If there's a specific context or scenario you have in mind, let me know!"

I then asked it what 36 hexadecimal was in decimal and it gave the correct answer (and workings)

So I asked it "When is 9 times 6 equals 36 correct"

It's response - My bolding again.
"The statement "9 times 6 equals 36" is always correct mathematically. It is accurate in any context involving multiplication. Whether in basic arithmetic, a math class, or solving a word problem, the multiplication of 9 and 6 will always equal 36. If you're looking for a specific scenario or nuance, let me know!"

So I challenged it "Why is the statement "9 times 6 equals 36" is always correct mathematically"

It's response: - It's bolding.
"The statement "9 times 6 equals 36" is always correct mathematically because it is based on the fundamental principles of arithmetic. Here’s a breakdown:

Definition of Multiplication: Multiplication is a mathematical operation that represents repeated addition. In this case, 9 times 6 means adding 9 six times:9+9+9+9+9+9=54
Multiplication Table: If you refer to a multiplication table, you’ll see that the product of 9 and 6 is consistently listed as 54.
Properties of Numbers: The operations of addition and multiplication are defined in such a way that they always yield the same result. Therefore, once you calculate 9×6, you always get 54.
Consistency Across Systems: Whether in base 10 (decimal), base 16 (hexadecimal), or any other numeral system, the underlying arithmetic remains the same.

So, no matter the context, 9×6 will always equal 54. If there's a specific scenario where you're questioning this, feel free to share!"

Note the first and last line contradict each other; it always equals 36 and it always equals 54. truly human scale intelligence as we always contradict ourselves

[Edit, correct error as a result of cut and paste, it duplicated 9x6on point 3 of the last quote"

hans1972 · Oct 15, 2024

Mitthrawnuruodo said:
This shows quite clearly that LLMs aren't "intelligent" in any reasonable sense of the word, they're just highly advanced at (speech/writing) pattern recognition.

Basically electronic parrots.

They can be highly useful, though. I've used Chat-GPT (4o with canvas and o1-preview) quite a lot for tweaking code examples to show in class, for instance.

Alan Turing would disagree with you.

His entire argument in the Imitation Game is that we shouldn't decide intelligence on how it's done, only by it's result.

If LLMs can produce results in intellectual tasks which are indistinguishable from human results, then LLMs are intelligent.

hans1972 · Oct 15, 2024

Photoshopper said:
Why has no one else reported this? It took the “newcomer” Apple to figure it out and to tell the truth?

What Apple reported isn't new to anyone who is working in the field or has just used LLMs for a while.

hans1972 · Oct 15, 2024

Chuckeee said:
Actually, it’s not a surprise at all. Most people appear to use AI, do not look at the responses. They just blindly accept what they’re given believing that it’s accurate. They try it three or four times they like what comes out and after that, they just take the response without critically, looking at it To determine if it makes any sense.

I haven't met or talked to a single person who takes everything a chatbot tells them to be 100% true all of the time.
My exerpience is from IT and the finance industry.

It's all about using a tool to do your job faster or better.

hans1972 · Oct 15, 2024

attila said:
Does this question exist elsewhere on the Internet? Yes it does. The conclusion is indexed.

LLMs absolutely cannot count. They can't do basic math because that's not what the algorithm is built for. They can do excellent pattern matching.

LLMs might not be able to do basic math and counting, but I used ChatGPT to solve the following problem:

For a specific intersection, the following applies at a given moment:

• A car is 4 km east of the intersection and is moving westward at a speed of 80 km/h.

• A bus is 3 km north of the intersection and is moving northward at a speed of 50 km/h.

What is the rate of change of the distance between the car and the bus (in a straight line) at this moment?

hans1972 · Oct 15, 2024

polyphenol said:
In the context of this thread, I decided to ask a much more limited question - one which is actually easy to answer using publicly accessible databases/sites.

The answers were so wrong, the entire rest of my session was spent telling it where it was wrong. Until I hit the limit:

You’ve hit the Free plan limit for GPT-4o.

But it was so, umm, smug, asserting its rubbish replies were true and accurate. Every correction I made, it then implied its replies were now true and accurate. And they were still wrong.

Why don't you publish your question so others can test?

polyphenol · Oct 15, 2024

hans1972 said:
Why don't you publish your question so others can test?

What makes of levothyroxine tablets are available in the UK?

Mitthrawnuruodo · Oct 15, 2024

hans1972 said:
Alan Turing would disagree with you.

His entire argument in the Imitation Game is that we shouldn't decide intelligence on how it's done, only by it's result.

If LLMs can produce results in intellectual tasks which are indistinguishable from human results, then LLMs are intelligent.

I'm with Ada Lovelace on this...

"Lovelace wrote about her belief that while computers had endless potential, they could not be truly intelligent. She argued that a program can be engineered to do only what we humans know how to do. "

Ada Lovelace: The World’s First Computer Programmer Who Predicted Artificial Intelligence

During Women’s History Month, I am remembering Ada Lovelace’s contributions and thinking about the impact she had on me as a scientist and mathematician

www.nist.gov

Algr · Oct 15, 2024

metalsquid said:
Yes, I did mention decimal,

This might have nudged it away from the response you wanted. "Decimal" means base ten, thus precluding base thirteen.

hans1972 said:
I haven't met or talked to a single person who takes everything a chatbot tells them to be 100% true all of the time.
My exerpience is from IT and the finance industry.

People who are perfectly smart in one context can be jarringly inept outside of their fields. Remember those lawyers who got disbarred for submitting AI info that they did not verify? Jordan Peterson also comes to mind.

johnsawyercjs · Oct 15, 2024

It’s been known from the start, and their developers have said so, that LLMs are incapable of reasoning as we define it. At first they were described as an advanced form of autofill, and though they seem to be something more than that, especially the latest versions, it’s always been known that they’re still inherently limited, since they’re just LLMs.

SkweeBop · Oct 16, 2024

wikiverse said:
This is the equivalent of writing in cursive to hide information from Gen Z.

both effective

rmariboe · Oct 16, 2024

anthony13 said:
I do find it odd that there’s a suggestion with ai that it is literally aware and ‘thinking’. I am not particularly educated on these things, but that seems impossible to me. Maybe someone here can explain it better than all these companies have tried to, but my impression is we’ve just reached a point where the processors are fast enough to access all this data that has been collected in a very efficient way and compare/collate.

It’s not really stored data though but just heaps of cut up words (tokens) with probabilities attached to them so the machine can select the most likely token to follow the preceding series of tokens.

The real question is: Is the process happening in the carbon based brain different from this? Are we just very, very large multi modal (with sight, hearing, tactile and olfactory inputs) “language models”?

rmariboe · Oct 16, 2024

I am confused that people at that level talk of LLMs as “reasoning”. It’s a probability machine! It chooses the next token based on the preceding tokens’ coherence with all the material it has “read” 🤷🏽‍♂️
As said above, it’s a very large auto completion tool with some added instructions to make it spit out tokens in a manner that we find meaningful.

joelpatrick · Oct 17, 2024

loui100 said:
Claude solved this on first try so this news is a big nothing burger. Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

Actually ChatGPT 4o also solved it on first go, so what the hell? I actually ran this multiple times with both models and it never came out wrong. Did they run the test 10000 times until the AI tripped?
View attachment 2437141

I tried this on ChatGPT 4o, ChatGPT o1-mini, ChatGPT 4o mini, ChatGPT 4, ChatGPT o1-preview, and Claude 3.5 Sonnet. All except 4o mini got it right the first time. 4o mini got it right when I tried it a second time in a fresh chat.

johnsawyercjs · Oct 18, 2024

rmariboe said:
The real question is: Is the process happening in the carbon based brain different from this? Are we just very, very large multi modal (with sight, hearing, tactile and olfactory inputs) “language models”?

I sometimes detect in some people's statements, observations, etc. a strong hint of relatively simple LLM-like processing/reasoning, making "mere" associations between things already in their head to come to various conclusions, instead of relying on more up-to-date, expansive, real-world data, first-hand measurements, etc., and then using all of that to engage in what we normally consider advanced reasoning. But maybe this LLM-like process in carbon brains is one of several useful modes we have to model things to come to quick conclusions, or at least theories, about many things, though modifiable with further investigation and testing that, at the moment, LLMs are limited in performing.

From what I've read, the origins of data processing techniques for LLMs was (and still is) based on looking at how humans think, including the structure of our neural nets, and attempting to model (in a simplified way) at least part of that process and structure in silicon and computer code, so it's no surprise that we see some resemblance between the two.

As ChatGPT just told me: "The design of LLMs and other artificial neural networks (ANNs) in AI is based on simplified mathematical abstractions of some key principles seen in biological systems. But while LLMs and other AI systems are inspired by the principles of biological neural networks, they are mathematical models built in silicon that only approximate certain aspects of brain function, such as learning and signal processing."

Search

Search

Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

metalsquid

macrumors member

turbineseaplane

Contributor

ItWasNotMe

macrumors 6502

hans1972

macrumors 601

hans1972

macrumors 601

hans1972

macrumors 601

hans1972

macrumors 601

hans1972

macrumors 601

polyphenol

macrumors 68020

Mitthrawnuruodo

Moderator emeritus

Ada Lovelace: The World’s First Computer Programmer Who Predicted Artificial Intelligence

Algr

macrumors 6502a

johnsawyercjs

macrumors 65816

SkweeBop

macrumors regular

rmariboe

macrumors regular

rmariboe

macrumors regular

joelpatrick

macrumors newbie

johnsawyercjs

macrumors 65816

Our Staff