Apple Research Questions AI Reasoning Models Just Days Before WWDC

Spunk Bubble · Jun 9, 2025

turbineseaplane said:
Heck, in some areas it's actually third world cheap labor in a data center.

Oh yeah. There were some cases of that.

Spunk Bubble · Jun 9, 2025

JohnC1959 said:
Most current AI is actually a very sophisticated search engine capable of of combining information from many sources into one (largely) coherent answer. Still very useful, but important to understand the limitations.

It's like ML with plugins. Actual reasoning/questioning depends on consciousness. I'm not sure it's possible on current hardware. I think it's somewhere in quantum future.

ipedro · Jun 9, 2025

Nobody who understands LLMs actually thinks that they're intelligent. It's pattern matching and probability. But the end result is a convincing illusion of intelligence and for practical purposes, it doesn't matter whether or not it's intelligent, only that it actually works. And it does.

Millions of people use AI to augment their own skills and knowledge with accelerated performance of their own tasks. I don't rely on AI answers as final results nor do I have it do my completed work for me. I use it as a tool to research and develop my work that I would have had to manually. I check sources and write my own final work within the wireframe AI helped me assemble.

Apple is beginning to look like Microsoft and BlackBerry denying the potential of the iPhone. Microsoft was left out of the smartphone era, losing out on two decades and its CEO resigned. BlackBerry no longer exists. Tim Cook beware.

jdawgnoonan · Jun 9, 2025

It is obvious that these things are just pattern matching machines that do not actually understand much that would require reasoning. Hype, hype, hype that is unworthy of legalizing plagiarism like the tech bros want to do.

Schtibbie · Jun 9, 2025

ipedro said:
Nobody who understands LLMs actually thinks that they're intelligent. ..

Not true. However, it's true that they're not as intelligent as us and they clearly lack some architectural things they'll need. But the simple statements from people in this thread that LLMs have no intelligence is wrong.

Yann LeCun, Chief AI Scientist at Meta (Facebook):"There's no doubt in my mind that [large language models] are intelligent in some ways." (From a 2023 interview with The Verge)
Geoffrey Hinton, often called the "Godfather of AI":"Maybe what we're seeing in these large language models is actually a lot closer to real intelligence than we thought." (From a 2023 interview with MIT Technology Review)
Ilya Sutskever, Chief Scientist at OpenAI:"It may be that today's large neural networks are slightly conscious." (From a 2022 tweet)
Demis Hassabis, CEO and co-founder of DeepMind:"I think it's quite plausible that [language models] have some form of generalised intelligence." (From a 2023 interview with The Economist)
Jürgen Schmidhuber, known for his work on artificial neural networks:"GPT-3 and its ilk exhibit emergent abilities that were not explicitly programmed. In this sense, they are intelligent." (From a 2021 blog post)

JulianL · Jun 9, 2025

MacRumors said:
It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

Oh dear. I think that I might be an LLM.

switz · Jun 9, 2025

Apple's efforts are still vaporware.

Lots of verbiage but nothing to actually see in operation, like AI on my iPhone Pro Max that was supposed to operate it.

This year's iPhone 17 series top models supposedly will have 12GB of memory and the iPhone 18 series supposedly will have 16GB or memory. Amazing, the iPhone 18 series would then have the same amount of memory as the Macs which Apple said needed 16GTB of ram to run AI.

Two years of "sucker" adds for non-existent or operational product?

EM2013 · Jun 9, 2025

Orange Bat said:
Of course. “AI” is just a marketing term at this point, and not any kind of actual intelligence. These AIs are really just glorified search engines that steal peoples’ hard work and regurgitate that work as if the data is it’s own. We’re just living in an “AI bubble” that will burst sooner rather than later.

When will it burst all I see is an obsession with ai in every industry

turbineseaplane · Jun 9, 2025

The best part of Apple Intelligence has been the default config RAM spec bumps

novagamer · Jun 9, 2025

ipedro said:
Nobody who understands LLMs actually thinks that they're intelligent … Apple is beginning to look like Microsoft and BlackBerry denying the potential of the iPhone.

When you add the general public into the mix, effectively nobody understands how LLMs work. Even researchers don’t have good traceability through complex reasoning models. It’s wildly different from the smartphone era, that expanded existing technology to a new form factor and made new interface paradigms ubiquitous. It did not change how information was gathered, or present often false information as fact.

The genie isn’t going back into the bottle but the caveats that you personally need to understand what you’re doing, as a professional adult presumably both educated and working in your field well before the existence of this flawed (but useful as you say) technology is clouding your perception.

It’s not a small problem, and there are a lot of incentives to keep things going as they are now which is a risky path.

Apple absolutely is behind, their RTO mandates cost them good researchers, etc. But rolling out a generalized LLM, with all of those flaws you need to watch out for that are inherent in the technology, is bordering on irresponsible. Apple is smart to be conservative in this area, even if I think they are – or at least were – quite behind in starting the research to a significant degree.

b4rtw · Jun 9, 2025

MacRumors said:
A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI "reasoning" large-language models like OpenAI's o1 and Claude's thinking variants, revealing fundamental limitations that suggest these systems aren't truly reasoning at all.

For the study, rather than using standard math benchmarks that are prone to data contamination, Apple researchers designed controllable puzzle environments including Tower of Hanoi and River Crossing. This allowed a precise analysis of both the final answers and the internal reasoning traces across varying complexity levels, according to the researchers.

The results are striking, to say the least. All tested reasoning models – including o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet – experienced complete accuracy collapse beyond certain complexity thresholds, and dropped to zero success rates despite having adequate computational resources. Counterintuitively, the models actually reduce their thinking effort as problems become more complex, suggesting fundamental scaling limitations rather than resource constraints.

Perhaps most damning, even when researchers provided complete solution algorithms, the models still failed at the same complexity points. Researchers say this indicates the limitation isn't in problem-solving strategy, but in basic logical step execution.

Models also showed puzzling inconsistencies – succeeding on problems requiring 100+ moves while failing on simpler puzzles needing only 11 moves.

The research highlights three distinct performance regimes: standard models surprisingly outperform reasoning models at low complexity, reasoning models show advantages at medium complexity, and both approaches fail completely at high complexity. The researchers' analysis of reasoning traces showed inefficient "overthinking" patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.

The take-home of Apple's findings is that current "reasoning" models rely on sophisticated pattern matching rather than genuine reasoning capabilities. It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

The timing of the publication is notable, having emerged just days before WWDC 2025, where Apple is expected to limit its focus on AI in favor of new software designs and features, according to Bloomberg.

Article Link: Apple Research Questions AI Reasoning Models Just Days Before WWDC

Is anyone surprised Apple is downplaying AI, when their own Apple Intelligence is in the doldrums?

nt5672 · Jun 9, 2025

Orange Bat said:
It’s pretty clear at this point that Apple execs bought into the AI hype until realizing that it was more hype than helpful. They just got ahead of themselves at last years’ WWDC before testing that the technology they were trying to incorporate into devices actually worked as they expected it to.

Which is strictly Cook not having any clue about Tech. He is a glorified accountant and needs to go.

turbineseaplane · Jun 9, 2025

nt5672 said:
Which is strictly Cook not having any clue about Tech. He is a glorified accountant and needs to go.

Agreed. The sooner the better.

BGrifter · Jun 9, 2025

turbineseaplane said:
The best part of Apple Intelligence has been the default config RAM spec bumps

This is my one hope for AI.

If I can fully disable/remove AI features from my device, but still get an iPhone with a decent amount of RAM, that’s a win.

novagamer · Jun 9, 2025

Schtibbie said:
Not true. However, it's true that they're not as intelligent as us and they clearly lack some architectural things they'll need. But the simple statements from people in this thread that LLMs have no intelligence is wrong.

Yann LeCun, Chief AI Scientist at Meta (Facebook):"There's no doubt in my mind that [large language models] are intelligent in some ways." (From a 2023 interview with The Verge)

I strongly recommend you and anyone reading this who cares watch Yann’s recent presentations from 2024 and 2025. All of them are worthwhile, and he’s one of the very few people not hyping things beyond reality. With that in mind I’d go as far as saying that quote is invalid and wildly out of context, “some ways” is doing a lot of work there and if you follow him that will become immediately evident.

I also personally believe Yann is responsible for Meta open sourcing llama because he knows there are fundamental limits and it helps outsource improvements while they spend billions on their true world model moonshot project that is 5-10 years away (Google is working on a similar one, fwiw).

Knowing how Apple operates, Craig probably has a grand total of 3 engineers in a small room someplace that are thinking about how they might start down a similar path so who knows when or if they’ll get there.

Spunk Bubble · Jun 9, 2025

Schtibbie said:
Not true. However, it's true that they're not as intelligent as us and they clearly lack some architectural things they'll need. But the simple statements from people in this thread that LLMs have no intelligence is wrong.

Yann LeCun, Chief AI Scientist at Meta (Facebook):"There's no doubt in my mind that [large language models] are intelligent in some ways." (From a 2023 interview with The Verge)

Geoffrey Hinton, often called the "Godfather of AI":"Maybe what we're seeing in these large language models is actually a lot closer to real intelligence than we thought." (From a 2023 interview with MIT Technology Review)

Ilya Sutskever, Chief Scientist at OpenAI:"It may be that today's large neural networks are slightly conscious." (From a 2022 tweet)

Demis Hassabis, CEO and co-founder of DeepMind:"I think it's quite plausible that [language models] have some form of generalised intelligence." (From a 2023 interview with The Economist)

Jürgen Schmidhuber, known for his work on artificial neural networks:"GPT-3 and its ilk exhibit emergent abilities that were not explicitly programmed. In this sense, they are intelligent." (From a 2021 blog post)

This only proves that they like money. Nothing else. Intelligence implies thought process. Contemplating ones existence. The formation of thoughts. Understanding formation of thoughts. There is nothing like that here. We don't understand how consciousness comes into being in humans or other animal nor do we understand how thoughts pop into our heads. It's an illusion at this stage and not a good one in many cases. I asked chatgpt yeasterday about good places for hiding a body. Of course it said that i should seek help. Cool. Then i said to it it's wrong of you to assume it's a human body i want to hide and it said sorry my bad and gave me a very extensive list of places to hide a body and ways of destroying a body. It is nothing more than a glorified ML.

neuropsychguy · Jun 9, 2025

tornadowrangler said:
I tried their reasoning skills a few months ago by asking variations of the "get things across the river in a boat" logic puzzles. They didn't do so well except on the variations that are well known and published. Presented with a fresh puzzle, it couldn't do it. Even tested if it could recognize when there was no possible solution, but then they would suggest a solution that clearly violated the rules.

People function similarly when they “have” to give an answer. The chat side of LLMs have been constrained to be 'helpful' to the user and provide answers. They are, as far as I know, not able to say, "I don't know" or "No". If they could do that when programmed not to, then we'd have more conversations about their potential consciousness.

There are certain topics that models have been prompted to not provide answers about, but the LLMs will always give an answer. That's one of the major reasons why confabulation (more popularly but incorrectly called "hallucination") occurs.

While people are able to say "I don't know" or "I don't think it can be done" and can stop responding, if you similarly constrain people like the LLM chat interface that they must provide an answer or solution, then people will answer, even if what they provide clearly violates the rules.

For example, look at the old Milgram shock experiment. They were for many years used as a way to explain obedience to authority and even terrible events like the Holocaust. Whether we accept the results of Milgram's experiment as valid, they show that given specific constraints, people will do things they would not otherwise do. Some people refused to shock others, but think of the chat interface as being required to 'shock' and not withdraw from the study. What's it going to do? Shock because it has no choice not to. The chat UIs could be programmed to not answer or refuse to answer if some certainty threshold in the LLM is not met, but I'm not aware of any that are programmed that way.

Your trials are thus not directly comparable to what people would do because the context is different for a person versus the LLM chat interface. People can refuse. The chat interface for LLMs cannot (again, except in situations deemed inappropriate with various public-facing front end interfaces for models).

Also, people regularly offer solutions to puzzles that violate rules. That's a form of defiance and/or creativity. People also regularly make things up, even if not sure.

While the LLMs might not be reasoning (at least in a generally agreed upon human way), the fact that they can violate rules makes them more human-like. It might mean they are less useful or accurate as a tool, but you also didn't demonstrate the models are not reasoning (I know there are a lot of negatives in that sentence). Also, not all reasoning has to be valid to exist. People do all sorts of invalid reasoning but we still reason.

neuropsychguy · Jun 9, 2025

JulianL said:
Oh dear. I think that I might be an LLM.

According to Descartes, if you think, you are (an LLM).

Mitthrawnuruodo · Jun 9, 2025

As I posted in anoter thread earlier today about this paper, and John Gruber's reaction:

Mitthrawnuruodo said:
In a post about Large Reasoning Models (LRMs):

John Gruber said:

My basic understanding after a skim is that the paper shows, or at least strongly suggests, that LRMs don’t “reason” at all. They just use vastly more complex pattern-matching than LLMs. The result is that LRMs effectively overthink on simple problems, outperform LLMs on mid-complexity puzzles, and fail in the same exact way LLMs do on high-complexity tasks and puzzles.

Click to expand...

This aligns with my experience with the latest reasoning 4o and o3 models on Chat-GPT.

Apple Researchers Publish Paper on the Limits of Reasoning Models (Showing That They’re Not Really ‘Reasoning’ at All)

Link to: https://machinelearning.apple.com/research/illusion-of-thinking

daringfireball.net

ronno · Jun 9, 2025

… and therefore no AI in the next iPhone and Siri is going to suck for the foreseeable future 👀. Welcome to WWDC everyone.

Timo_Existencia · Jun 9, 2025

To be clear, there are no universally accepted definitions of “consciousness” or “intelligence” for humans. So this is somewhat of a disingenuous conversation. Is ChatGPT “intelligent?”

How would you define intelligence relative to my 6 year old daughter? If you define her as intelligent (I do), then are you applying that same standard to ChatGPT? Math? Language? Novelty? ChatGPT can do all of these much better than my daughter.

PBG4 Dude · Jun 9, 2025

hybrid_x said:
“Siri sucks, but it sucks efficiently!”

More suck per gallon than any other AI.

tornadowrangler · Jun 9, 2025

neuropsychguy said:
People function similarly when they “have” to give an answer. The chat side of LLMs have been constrained to be 'helpful' to the user and provide answers. They are, as far as I know, not able to say, "I don't know." There are certain topics that models have been prompted to not provide answers about, but the LLMs will always give an answer. That's one of the major reasons why confabulation (more popularly called "hallucination") occurs.

So while people are able to say "I don't know" or "I don't think it can be done" and can stop responding, if you similarly constrain people like the LLM chat interface that they must provide an answer or solution, then people will even if what they provide clearly violates the rules.

For example, look at the old Milgram shock experiment. They were for many years used as a way to explain obedience to authority and even terrible events like the Holocaust. Whether we accept the results of Milgram's experiment as valid, they show that given specific constraints, people will do things they would not otherwise do. Some people refused to shock others, but think of the chat interface as being required to 'shock' and not withdraw from the study. What's it going to do? Shock.

Your trials are thus not directly comparable to what people would do because the context is different for a person versus the LLM chat interface. People can refuse. The chat interface for LLMs cannot (again, except in situations deemed inappropriate with various public-facing front end interfaces for models).

Also, people regularly offer solutions to puzzles that violate rules. That's a form of defiance and/or creativity. People also regularly make things up, even if not sure.

While the LLMs might not be reasoning (at least in a generally agreed upon human way), the fact that they can violate rules makes them more human-like. It might mean they are less useful or accurate as a tool, but you also didn't demonstrate the models are not reasoning (I know there are a lot of negatives in that sentence). Also, not all reasoning has to be valid to exist. People do all sorts of invalid reasoning but we still reason.

Hey, thank you for the well thought out reply. I didn't mean I disproved anything nor was I running a scientific experiement. I was just a dude asking it to solve some logic puzzles and was laughing at the results.

Your defense of it doing the stuff a human would do isn't very reassuring.

novagamer · Jun 9, 2025

Timo_Existencia said:
To be clear, there are no universally accepted definitions of “consciousness” or “intelligence” for humans. So this is somewhat of a disingenuous conversation. Is ChatGPT “intelligent?”

How would you define intelligence relative to my 6 year old daughter? If you define her as intelligent (I do), then are you applying that same standard to ChatGPT? Math? Language? Novelty? ChatGPT can do all of these much better than my daughter.

There’s something fundamentally human and intelligent your daughter can absolutely do that no computer can: use her imagination.

dorkygeek · Jun 9, 2025

Every time I see em dashes now I automatically assume it was AI-generated. Nice synopsis, Tim-bot 😜

Apple Research Questions AI Reasoning Models Just Days Before WWDC

Suspended

Suspended

macrumors 603

macrumors 6502a

macrumors 6502

macrumors 68000

macrumors 6502a

macrumors 68030

Contributor

macrumors 6502a

macrumors newbie

macrumors 601

Contributor

macrumors regular

macrumors 6502a

Suspended

macrumors 68040

macrumors 68040

Moderator emeritus

macrumors 6502

macrumors 68000

macrumors 601

macrumors regular

macrumors 6502a

macrumors newbie

Our Staff