Apple Research Questions AI Reasoning Models Just Days Before WWDC

MacRumors · Jun 9, 2025

A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI "reasoning" large-language models like OpenAI's o1 and Claude's thinking variants, revealing fundamental limitations that suggest these systems aren't truly reasoning at all.

For the study, rather than using standard math benchmarks that are prone to data contamination, Apple researchers designed controllable puzzle environments including Tower of Hanoi and River Crossing. This allowed a precise analysis of both the final answers and the internal reasoning traces across varying complexity levels, according to the researchers.

The results are striking, to say the least. All tested reasoning models – including o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet – experienced complete accuracy collapse beyond certain complexity thresholds, and dropped to zero success rates despite having adequate computational resources. Counterintuitively, the models actually reduce their thinking effort as problems become more complex, suggesting fundamental scaling limitations rather than resource constraints.

Perhaps most damning, even when researchers provided complete solution algorithms, the models still failed at the same complexity points. Researchers say this indicates the limitation isn't in problem-solving strategy, but in basic logical step execution.

Models also showed puzzling inconsistencies – succeeding on problems requiring 100+ moves while failing on simpler puzzles needing only 11 moves.

The research highlights three distinct performance regimes: standard models surprisingly outperform reasoning models at low complexity, reasoning models show advantages at medium complexity, and both approaches fail completely at high complexity. The researchers' analysis of reasoning traces showed inefficient "overthinking" patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.

The take-home of Apple's findings is that current "reasoning" models rely on sophisticated pattern matching rather than genuine reasoning capabilities. It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

The timing of the publication is notable, having emerged just days before WWDC 2025, where Apple is expected to limit its focus on AI in favor of new software designs and features, according to Bloomberg.

Article Link: Apple Research Questions AI Reasoning Models Just Days Before WWDC

citysnaps · Jun 9, 2025

I don't find this surprising at all.

Salty Pirate · Jun 9, 2025

So AI is nothing more than clever programing?

trip1ex · Jun 9, 2025

Breaking news. The people who pretended otherwise always had something to sell.

Rradcircless · Jun 9, 2025

Apple really went and threw egg on every major AI company's face just so they weren't the only ones.

germanbeer007 · Jun 9, 2025

This was pretty obvious LLMs aren't the path to AGI

Orange Bat · Jun 9, 2025

Of course. “AI” is just a marketing term at this point, and not any kind of actual intelligence. These AIs are really just glorified search engines that steal peoples’ hard work and regurgitate that work as if the data is it’s own. We’re just living in an “AI bubble” that will burst sooner rather than later.

flybass · Jun 9, 2025

This isn’t a surprise to users. However, it doesn’t mean the reasoning models aren’t very helpful. They still generate huge productivity gains. There is just a learning curve to feeding and instructing the problem in a digestible way - people advanced in their fields can do this while eliminating the need for junior analysts.

citysnaps · Jun 9, 2025

Shalev Lazarof said:
Apple is on Point and eventually deliver the better experience in my opinion, once again.

Yup... Apple taking its time will result in a superior AI user experience.

Sadly... most people here won't care because they said they'll be turning AI off as soon as it's on their devices.

Billy_Bob · Jun 9, 2025

Sour grapes? "Everybody's product actually sucks, but at least ours is honest about sucking."

I'm reminded of the story of Bill Atkinson at the Xerox PARC demo; he was mistaken about what he thought he'd seen the Xerox system doing, so because he assumed it was possible he actually designed a Lisa windowing system that outperformed what the Xerox machine could do. The old Apple would try to achieve the impossible; the new Apple deflects blame for not achieving the possible.

The fact remains that AI is indeed largely a parlor trick; it's nothing at all like "intelligence." But that parlor trick can be leveraged to perform many very useful functions if you can afford the energy required to do the brute-force processing it requires to operate at a large scale. And Apple to date has performed that parlor trick far less effectively than most of its competitors. If Apple wants to save face regarding its poor showing in AI, it could reasonably point out that while Siri does indeed suck, it doesn't suck nearly so much when viewed from a cost-per-watt perspective rather than simply by the results it can achieve.

michaeljk · Jun 9, 2025

Results may or may not be credible (I don't know enough to know) but the motive is not credible at all.

Orange Bat · Jun 9, 2025

Shalev Lazarof said:
Apple is ON Point once again, eventually deliver the better experience in my opinion

It’s pretty clear at this point that Apple execs bought into the AI hype until realizing that it was more hype than helpful. They just got ahead of themselves at last years’ WWDC before testing that the technology they were trying to incorporate into devices actually worked as they expected it to.

cuiver · Jun 9, 2025

The reality is that what we are seeing is the worst that is gonna get, it's an uphill trajectory with notable improvements.

And of course it has flaws, but choosing to discredit and being overly sceptical in this context is somewhat strange, specially if your company is being criticized for losing the AI train.

zorinlynx · Jun 9, 2025

LLM GenAI is pretty garbage technology. The less time it takes people to realize this, the better.

Yes, it does have some niche uses. But people are trying to push it as a solution to everything and even as far as replacing human beings, and it's just not capable of that. Not only that, but why do we want to replace human beings? Especially in the arts? I'd rather look at things made by people. It doesn't matter how visually stunning something is; art has no soul if there is no artist.

Shalev Lazarof · Jun 9, 2025

citysnaps said:
Yup... Apple taking its time will result in a superior AI user experience.

Sadly... most people here won't care because they said they'll be turning AI off as soon as it's on their devices.

I don't

Orange Bat said:
It’s pretty clear at this point that Apple execs bought into the AI hype until realizing that it was more hype than helpful. They just got ahead of themselves at last years’ WWDC before testing that the technology they were trying to incorporate into devices actually worked as they expected it to.

I believe you're right, it's possible

Unami · Jun 9, 2025

MacRumors said:
It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

Sounds like me, actually.

Ctrlos · Jun 9, 2025

michaeljk said:
Results may or may not be credible (I don't know enough to know) but the motive is not credible at all.

Its been standard Apple practice for decades to assess the competition in a variety of ways and then improve upon it. This is merely a glimpse behind the curtain.

citysnaps · Jun 9, 2025

Shalev Lazarof said:
I don't

Same here. I'm really looking forward to Apple's AI implementation.

Orange Bat · Jun 9, 2025

zorinlynx said:
LLM GenAI is pretty garbage technology. The less time it takes people to realize this, the better.

Yes, it does have some niche uses. But people are trying to push it as a solution to everything and even as far as replacing human beings, and it's just not capable of that. Not only that, but why do we want to replace human beings? Especially in the arts? I'd rather look at things made by people. It doesn't matter how visually stunning something is; art has no soul if there is no artist.

That’s a bingo. LLMs are incredibly unreliable for anything that requires accuracy and consistency.

Your second point is a philosophical one that I think is more important than people realize. AI is incapable of any true creativity. Its responses are just a pastiche of information originally created by human beings, effectively plagiarizing human work. Who wants to live in a world where a person does all the legwork and creativity only for a computer to steal from that human?

happygodavid · Jun 9, 2025

THIS is the biggest news article of the week. Perhaps the decade. Why? Because things with AI are happening at lightning speed, and Apple has been caught with its pants down. This research shows they aren't completely asleep at the wheel. I read the summary, skimmed the full publication, and I am so glad to see the work they've done. I predict their screwup with Apple Intelligence will end up being either a) the starting point in a downturn at Apple (like so many pundits seem to think) or b) another example of Apple getting ahead of itself and then course-correcting because of said failure. Let's hope they will ultimately end up producing an AI that--like many things they've done in the past, even after screwups--surpasses everything else. "We don't have to be first. We want to be the best." We shall see.

turbineseaplane · Jun 9, 2025

“….and now here’s Ashley to talk about some new Genmoji!”

bluecoast · Jun 9, 2025

This may be so but they’re still very useful.

I read someone way smarter than me who said to think of LLMs as very clever interns who are fast & knowledgeable but who also sometimes make stunningly stupid mistakes.

And are too eager to please when a wiser person would say that there isn’t enough data etc.

hybrid_x · Jun 9, 2025

Billy_Bob said:
If Apple wants to save face regarding its poor showing in AI, it could reasonably point out that while Siri does indeed suck, it doesn't suck nearly so much when viewed from a cost-per-watt perspective rather than simply by the results it can achieve.

“Siri sucks, but it sucks efficiently!”

GMShadow · Jun 9, 2025

This has been my suspicion for a bit - I already knew the other models were crap, but that Apple was trying to go beyond those, hence the delays.

jonplackett · Jun 9, 2025

Apple need to spend less time complaining AI isn't perfect and more time making AI do something at least vaguely useful. You don't need AGI to get Siri to not be completely useless. Just get on with using AI properly already.

Apple Research Questions AI Reasoning Models Just Days Before WWDC

macrumors bot

macrumors G5

macrumors 6502a

macrumors 68040

macrumors 6502

macrumors 6502

macrumors 65816

macrumors 6502

macrumors G5

macrumors member

macrumors regular

macrumors 65816

macrumors member

macrumors G3

Contributor

macrumors 68000

macrumors 68020

macrumors G5

macrumors 65816

macrumors 6502

Contributor

macrumors 68030

macrumors 6502

macrumors 68020

macrumors member

Our Staff