Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

Chuckeee · Oct 14, 2024

Wieslawo said:
The AI fad was created by journalists and the media. After all, it is clickable.

I think it’s better to say that the current AI fever was actually created by venture capitalist and con men. Journalist and the media are just tools that have been taken for a ride.

Jeeve Stubbs · Oct 14, 2024

I guess Jobs would call current AI a third-rate product. Apple, that's where you come in!

An-apple-a-day · Oct 14, 2024

These findings were intuitively obvious to me.

falainber · Oct 14, 2024

cjsuk said:
Whaaaaat. They literally have been shipping more ML hardware than any other vendor on the planet and have been doing it for about 7 years. Quick extrapolation shows at least 750 million ML processors in the field.

Granted it can't tell the difference between cows and horses, but neither can anything else!

This has very little to do with AI. It's a hardware for matrix operations. Just look at how well Siri has done with it.

falainber · Oct 14, 2024

coolfactor said:
AI is a specific field of a more general Machine Learning field. Many experts in the field cringe at everything being called "AI" when they know much of it is not related to intelligence at all, but merely pattern recognition and applying learned processes to new input.

Apple has been employing Machine Learning (ML) for decades to various degrees. Any time you hear about "models", those really have nothing to do with AI, but are the output of a ML process.

Input > Learning > Model

Then the Model is applied to new input to create new output.

Input > Model > Output

When you snap a photo, that photo is instantly passed through zero or more pre-trained models to do things like colour correction, etc. That photo is also passed through image-recognition models to identify subjects, such as people, places and pets. So you can then ask you phone "Show me all photos of my vacation to Spain" and it can produce some useful results.

When you type on your keyboard, the characters are being passed through pre-trained models to do spelling correction, work prediction, etc.

This latest craze of "AI" is just involving vastly larger models (Large Language Models — LLMs) that require much more processing power and specialize in much larger tasks such as historical research and textual composition.

Apple Intelligence is Apple's endeavour into employing these larger models, which is also mandating minimum hardware requirements.

Does that help?

How? Despite everything you just said, Apple remains a midget in AI research. LLMs have become possible not just because we got more powerful hardware. They were enabled by the breakthroughs in fundamental research. Specifically, Google Brain and the University of Toronto developed "transformers" which now play a crucial role in the development and functioning of LLMs. Apple contribution? Siri. But even Siri was purchased not developed by Apple.

ChrisA · Oct 14, 2024

These results are so completely obvious to anyone working in the field that I wonder why they bother to do the test.

My guess is the focus of the research was not to test the current LLMs but rather to develop a test that could detect actual reasoning. Such a test is not easy. It is VERY hard to look at a robot from the outside and determine why it is doing what it does. Was it all pre-programmed or did the software figure it out? It is hard to know if you can not look inside.

But these researchers do know how the LLM works inside and knew it could not be doing any kind of reasoning. Running a test would be pointless. So I suspect they are actually working on testing methods.

Frantisekj · Oct 14, 2024

Timpetus said:
If this surprises you, you've been lied to. Next, figure out why they wanted you to think "AI" was actually thinking in a way qualitatively similar to humans. Was it just for money? Was it to scare you and make you easier to control?

My expeeience with calcilations Is The same. In my case I had strong feeling it was programmed to give specific results not truth 😂

bayportbob · Oct 14, 2024

back in the primitive 1990's the neural networks I worked with all had:

models that do not use real logic to solve problems but instead rely on sophisticated pattern recognition learned during training.

this leads us to where we are today and want to go:

is a potentially troubling sign for the future of AI applications that require consistent, accurate reasoning in real-world contexts.

sell the stocks now, the emperor has no clothes on...

RealityM · Oct 14, 2024

I asked Perplexity AI "how many 3-cent stamps in a dozen?" and got this "Simulated Intelligence" for an answer:

The answer to how many three-cent stamps are in a dozen is 12.To understand this, let's break it down:

A dozen is a unit of measurement that equals 12.
Each stamp costs 3 cents.

To calculate the total value of a dozen three-cent stamps, we multiply:12 × 3 = 36

This means that a dozen three-cent stamps would have a total value of 36 cents.

Visualization

To help visualize this concept: Each asterisk (*) represents one cent. As you can see, there are 36 asterisks in total, which corresponds to the value of a dozen three-cent stamps.

Number Name

In words, the total value is thirty-six cents.

This simple calculation demonstrates how basic arithmetic can be applied to everyday situations, such as determining the value of a set of stamps. It's a practical example of how multiplication is used in real-world scenarios.

mdriftmeyer · Oct 14, 2024

falainber said:
How? Despite everything you just said, Apple remains a midget in AI research. LLMs have become possible not just because we got more powerful hardware. They were enabled by the breakthroughs in fundamental research. Specifically, Google Brain and the University of Toronto developed "transformers" which now play a crucial role in the development and functioning of LLMs. Apple contribution? Siri. But even Siri was purchased not developed by Apple.

Apple builds its research and continues doing so for over 15 years privately in lieu of announcements. Google releases garbage continuously hoping something will stick and profits off of advertising it to death while working on 5 new things to hype, milk advertising, cancel projects, repeat and rinse.

LLM has never been AI. A form of Finite Automata with useful aspects in HPC, but Nvidia hyped it up to profit before the well runs dry. Post this hype Nvidia will crash hard, but expect a new pivot soon. Jensen if nothing else is a slippery in his promises about his company's technologies.

Because Wall Street labeled it as the coming of AI you have to follow suit or get ignored by the Press and Wall Street. Case in point, High Performance Computing isn't transformational like the catchphrase AI, and Hedge Funds live hype to capitalize on vapor.

Anyone who believes Oracle's generative AI plans and power consumption should now relax as that fantasy won't come to fruition in Larry Ellison's lifetime.

At 80 he'll be long deceased before his absurd misuse if cloud structure and global DBMS wet dreams.

bayportbob · Oct 14, 2024

RealityM said:
I asked Perplexity AI "how many 3-cent stamps in a dozen?" and got this "Simulated Intelligence" for an answer:

The answer to how many three-cent stamps are in a dozen is 12.To understand this, let's break it down:

A dozen is a unit of measurement that equals 12.

Each stamp costs 3 cents.

To calculate the total value of a dozen three-cent stamps, we multiply:12 × 3 = 36

This means that a dozen three-cent stamps would have a total value of 36 cents.

Visualization
To help visualize this concept: Each asterisk (*) represents one cent. As you can see, there are 36 asterisks in total, which corresponds to the value of a dozen three-cent stamps.

Number Name
In words, the total value is thirty-six cents.

This simple calculation demonstrates how basic arithmetic can be applied to everyday situations, such as determining the value of a set of stamps. It's a practical example of how multiplication is used in real-world scenarios.

no where in the question in your example was it asked what the value of the stamps would be. the question was how many stamps were in a dozen, implied is that there are a dozen stamps, in your example because it implicitly states that?

ikramerica · Oct 14, 2024

Try telling the model that there are Kiwi fruits and Kiwi people, and then create a puzzle saying 6 Kiwis ate 2 Kiwis a piece for 4 days and then 3 Kiwis each for 2 days then one Kiwi died and the remaining Kiwis ate 9 total Kiwis on thst day. How many Kiwis did the average Kiwi eat per day?

ikramerica · Oct 14, 2024

ItWasNotMe said:
I tried ChatGPT with the following question "why is 9 time 6 equals 42 a problem" and got the following result

"Saying that 9 times 6 equals 42 is a problem because it’s incorrect. The correct product of 9 and 6 is 54. Miscalculating can lead to errors in various contexts, such as math problems, financial calculations, or measurements. Accuracy is key in mathematics!:

When it says it's incorrect, that's not necessarily true, same for the answer being 54, it's not necessarily true. It's correct when it says it accuracy is key but so is completeness.

It made a fundamental assumption that I had fully specified the problem and was working in decimal, which was why it unequivocally said it was incorrect.

I wasn't I was using base 13 (where 13 is in decimal notation)

Decimal version: 9 (Decimal) X 6 (Decimal) = 54 (Decimal) , i.e. 5 X base + 4 -> 5X10 +4

Base 13 version: 9 (Base 13) X 6 (Base 13) = 42 (Base 13), i.e. 4 X base + 2 -> 4 X 13 +2

That the ultimate question is not expressed in decimal is why the universe is a strange and wondrous place.

The AI “forgot to uncheck Presume Teeth”

SkweeBop · Oct 14, 2024

Well, DUH. They are called "language models" for a reason. They aren't bloody thinking.

wikiverse · Oct 14, 2024

johnediii said:
All you have to do to avoid the coming rise of the machines is change your name.

This is the equivalent of writing in cursive to hide information from Gen Z.

dotnet · Oct 14, 2024

Mitthrawnuruodo said:
This shows quite clearly that LLMs aren't "intelligent" in any reasonable sense of the word, they're just highly advanced at (speech/writing) pattern recognition.

Awk/sed/grep on steroids. I suspected as much.

Arsenikdote · Oct 14, 2024

Timpetus said:
If this surprises you, you've been lied to. Next, figure out why they wanted you to think "AI" was actually thinking in a way qualitatively similar to humans. Was it just for money? Was it to scare you and make you easier to control?

Are you adding anything to the conversation or just putting on your tin foil hat?

Because I can easily say to you....
Are you are surprised that businesses, governments, your coworkers....anyone who doesn't love you and sometimes the people that do....lie to you for their own benefit and to control you?

ZZ9pluralZalpha · Oct 14, 2024

applezulu said:
Much of it is just popular hype from people who don't know enough to know the difference. Think of the NY Times article that sort of kicked it all off in the popular media a couple of years ago. The writer seemed convinced that the AI was obsessing over him and actually asking him to leave his wife. The actual transcript for anyone who's seen this stuff back through the decades, showed the AI program bouncing off programmed parameters and being pushed by the writer into shallow territory where it lacked sufficient data to create logical interactions. The writer and most people reading it, however, thought the AI was being borderline sentient.

The simpler occam's razor explanation why AI businesses have rolled with that perception or at least haven't tried much to refute it, is that it provides cover for the LLM "learning" process that steals copyrighted intellectual property and then regurgitates it in whole or in collage form. The sheen of possible sentience clouds the theft ("people also learn by consuming the work of others") as well as the plagiarism ("people are influenced by the work of others, so what then constitutes originality?"). When it's made clear that LLM AI is merely hoovering, blending and regurgitating with no involvement of any sort of reasoning process, it becomes clear that the theft of intellectual property is just that: theft of intellectual property.

I would argue that there's an even simpler motivation than avoidance of IP issues: "AI" currently means money and fame, e.g., from said NYT nitwit, whereas "thoughtless hoovering, blending, and regurgitating machine" does not. (Well, not unless you package it as a candidate for political office.)

svish · Oct 15, 2024

Not surprised. Expecting it to improve over time.

m00f · Oct 15, 2024

MacRumors said:
Apple's AI research team has uncovered significant weaknesses in the reasoning abilities of large language models, according to a newly published study.

The study, published on arXiv, outlines Apple's evaluation of a range of leading language models, including those from OpenAI, Meta, and other prominent developers, to determine how well these models could handle mathematical reasoning tasks. The findings reveal that even slight changes in the phrasing of questions can cause major discrepancies in model performance that can undermine their reliability in scenarios requiring logical consistency.

Apple draws attention to a persistent problem in language models: their reliance on pattern matching rather than genuine logical reasoning. In several tests, the researchers demonstrated that adding irrelevant information to a question—details that should not affect the mathematical outcome—can lead to vastly different answers from the models.

One example given in the paper involves a simple math problem asking how many kiwis a person collected over several days. When irrelevant details about the size of some kiwis were introduced, models such as OpenAI's o1 and Meta's Llama incorrectly adjusted the final total, despite the extra information having no bearing on the solution.

This fragility in reasoning prompted the researchers to conclude that the models do not use real logic to solve problems but instead rely on sophisticated pattern recognition learned during training. They found that "simply changing names can alter results," a potentially troubling sign for the future of AI applications that require consistent, accurate reasoning in real-world contexts.

According to the study, all models tested, from smaller open-source versions like Llama to proprietary models like OpenAI's GPT-4o, showed significant performance degradation when faced with seemingly inconsequential variations in the input data. Apple suggests that AI might need to combine neural networks with traditional, symbol-based reasoning called neurosymbolic AI to obtain more accurate decision-making and problem-solving abilities.

Article Link: Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

They‘re just holding the A.I. wrong

fdeg15cm1 · Oct 15, 2024

I read your comments and it sounds to me as if most of you have your heads buried in the sand.

Chat GPT is an absolute game changer for the enterprise customer. If you own a company that has to deal with a large number of customers it's an excellent customer support replacement given the correct database and training.

If you own a company that has to process lengthy legal (or otherwise) texts and summarize them, it's a huge force multiplier. If you have to work (produce documents, send emails) in many different languages, read them and summarize them, it's game changing too.

Can it make mistakes? Of course. But guess what, humans make even more of those.

The fact of the matter is it replaces a large share of the workforce for a couple of beans per month. THAT's why it's savagely succesful, not because people and business owners are stupid. As far as I'm concerned, they could charge $1000 per month for a single terminal and we would happily pay it.

Kwstas · Oct 15, 2024

LLMs have flaws. A couple of days ago I was trying some experiments in EFI partition and I had to restore due an error from a healthy partition. ChatGPT o1 insisted that we should make it manually, even creating files that were not there in the healthy partition, while I proposed to perform a byte-to-byte copy. My solution worked, though I'm not a programmer or developer. I just used common sense. Human brain is an efficiency masterpiece. AI has a long road ahead.

nonns · Oct 15, 2024

This is potentially very similar to the way autistic people think. They interpret things absolutely literally and often struggle with information filtering and thus can also end up giving very different answers based on information provided (however irrelevant) and question style. Not a great ad really for ai. I’ve said it before and I’ll say it again - not ready for prime time. Not enough regulation and control of its use for critical applications.

haemolysis · Oct 15, 2024

Some of these problems are common to humans undertaking logical reasoning as well. Humans are exceptionally swayed by red herrings, and, can quite easily give you two different outputs based on very slightly reworded questions. It’s extremely well documented phenomena. Anyone undertaking a basic research survey is aware of this problem: reword your questions, get totally different results. The idea that will ever eliminate this problem from AI is probably just not achievable. It’s perhaps less a problem with the AI itself and more problem with the ambiguity of language, or nuances of reasoning.

That said, it’s interesting research, even if it’s telling us something we all already know: AI remains extremely primitive at this stage.

hugogogo2003 · Oct 15, 2024

It's really cray how bad ChatGPT can be to solve very simple tasks. I tried to get it to design a chandelier for me, with 20 candles. It kept generating 40+ candles, with 20 at the front. I tried at least 25x different prompts to get him to change the total number to 20, asking it to remove them several times, asking it to count the candles, to which it wold reply "my bad, I count 40 candles, let me correct that immediately" and still produce an image with 40 candles. It's amazing at first and quickly becomes quite frustrating.

Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

macrumors 68040

macrumors 6502

macrumors regular

macrumors 68040

macrumors 68040

macrumors G5

macrumors 6502a

macrumors regular

macrumors member

Visualization​

Number Name​

macrumors 68040

macrumors regular

Visualization​

Number Name​

macrumors 68020

macrumors 68020

macrumors regular

macrumors 6502a

macrumors 68000

macrumors 6502a

macrumors 6502

macrumors P6

macrumors member

macrumors newbie

macrumors regular

macrumors regular

macrumors regular

macrumors member

Our Staff

Visualization

Number Name

Visualization

Number Name