Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
68,540
39,394


A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI "reasoning" large-language models like OpenAI's o1 and Claude's thinking variants, revealing fundamental limitations that suggest these systems aren't truly reasoning at all.

ml-research-apple.jpg

For the study, rather than using standard math benchmarks that are prone to data contamination, Apple researchers designed controllable puzzle environments including Tower of Hanoi and River Crossing. This allowed a precise analysis of both the final answers and the internal reasoning traces across varying complexity levels, according to the researchers.

The results are striking, to say the least. All tested reasoning models – including o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet – experienced complete accuracy collapse beyond certain complexity thresholds, and dropped to zero success rates despite having adequate computational resources. Counterintuitively, the models actually reduce their thinking effort as problems become more complex, suggesting fundamental scaling limitations rather than resource constraints.

Perhaps most damning, even when researchers provided complete solution algorithms, the models still failed at the same complexity points. Researchers say this indicates the limitation isn't in problem-solving strategy, but in basic logical step execution.

Models also showed puzzling inconsistencies – succeeding on problems requiring 100+ moves while failing on simpler puzzles needing only 11 moves.

The research highlights three distinct performance regimes: standard models surprisingly outperform reasoning models at low complexity, reasoning models show advantages at medium complexity, and both approaches fail completely at high complexity. The researchers' analysis of reasoning traces showed inefficient "overthinking" patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.

The take-home of Apple's findings is that current "reasoning" models rely on sophisticated pattern matching rather than genuine reasoning capabilities. It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

The timing of the publication is notable, having emerged just days before WWDC 2025, where Apple is expected to limit its focus on AI in favor of new software designs and features, according to Bloomberg.

Article Link: Apple Research Questions AI Reasoning Models Just Days Before WWDC
 
Of course. “AI” is just a marketing term at this point, and not any kind of actual intelligence. These AIs are really just glorified search engines that steal peoples’ hard work and regurgitate that work as if the data is it’s own. We’re just living in an “AI bubble” that will burst sooner rather than later.
 
Last edited:
This isn’t a surprise to users. However, it doesn’t mean the reasoning models aren’t very helpful. They still generate huge productivity gains. There is just a learning curve to feeding and instructing the problem in a digestible way - people advanced in their fields can do this while eliminating the need for junior analysts.
 
Sour grapes? "Everybody's product actually sucks, but at least ours is honest about sucking."

I'm reminded of the story of Bill Atkinson at the Xerox PARC demo; he was mistaken about what he thought he'd seen the Xerox system doing, so because he assumed it was possible he actually designed a Lisa windowing system that outperformed what the Xerox machine could do. The old Apple would try to achieve the impossible; the new Apple deflects blame for not achieving the possible.

The fact remains that AI is indeed largely a parlor trick; it's nothing at all like "intelligence." But that parlor trick can be leveraged to perform many very useful functions if you can afford the energy required to do the brute-force processing it requires to operate at a large scale. And Apple to date has performed that parlor trick far less effectively than most of its competitors. If Apple wants to save face regarding its poor showing in AI, it could reasonably point out that while Siri does indeed suck, it doesn't suck nearly so much when viewed from a cost-per-watt perspective rather than simply by the results it can achieve.
 
Apple is ON Point once again, eventually deliver the better experience in my opinion
It’s pretty clear at this point that Apple execs bought into the AI hype until realizing that it was more hype than helpful. They just got ahead of themselves at last years’ WWDC before testing that the technology they were trying to incorporate into devices actually worked as they expected it to.
 
The reality is that what we are seeing is the worst that is gonna get, it's an uphill trajectory with notable improvements.

And of course it has flaws, but choosing to discredit and being overly sceptical in this context is somewhat strange, specially if your company is being criticized for losing the AI train.
 
LLM GenAI is pretty garbage technology. The less time it takes people to realize this, the better.

Yes, it does have some niche uses. But people are trying to push it as a solution to everything and even as far as replacing human beings, and it's just not capable of that. Not only that, but why do we want to replace human beings? Especially in the arts? I'd rather look at things made by people. It doesn't matter how visually stunning something is; art has no soul if there is no artist.
 
Yup... Apple taking its time will result in a superior AI user experience.

Sadly... most people here won't care because they said they'll be turning AI off as soon as it's on their devices.
I don't
It’s pretty clear at this point that Apple execs bought into the AI hype until realizing that it was more hype than helpful. They just got ahead of themselves at last years’ WWDC before testing that the technology they were trying to incorporate into devices actually worked as they expected it to.
I believe you're right, it's possible
 
LLM GenAI is pretty garbage technology. The less time it takes people to realize this, the better.

Yes, it does have some niche uses. But people are trying to push it as a solution to everything and even as far as replacing human beings, and it's just not capable of that. Not only that, but why do we want to replace human beings? Especially in the arts? I'd rather look at things made by people. It doesn't matter how visually stunning something is; art has no soul if there is no artist.
That’s a bingo. LLMs are incredibly unreliable for anything that requires accuracy and consistency.

Your second point is a philosophical one that I think is more important than people realize. AI is incapable of any true creativity. Its responses are just a pastiche of information originally created by human beings, effectively plagiarizing human work. Who wants to live in a world where a person does all the legwork and creativity only for a computer to steal from that human?
 
THIS is the biggest news article of the week. Perhaps the decade. Why? Because things with AI are happening at lightning speed, and Apple has been caught with its pants down. This research shows they aren't completely asleep at the wheel. I read the summary, skimmed the full publication, and I am so glad to see the work they've done. I predict their screwup with Apple Intelligence will end up being either a) the starting point in a downturn at Apple (like so many pundits seem to think) or b) another example of Apple getting ahead of itself and then course-correcting because of said failure. Let's hope they will ultimately end up producing an AI that--like many things they've done in the past, even after screwups--surpasses everything else. "We don't have to be first. We want to be the best." We shall see.
 
This may be so but they’re still very useful.

I read someone way smarter than me who said to think of LLMs as very clever interns who are fast & knowledgeable but who also sometimes make stunningly stupid mistakes.

And are too eager to please when a wiser person would say that there isn’t enough data etc.
 
Apple need to spend less time complaining AI isn't perfect and more time making AI do something at least vaguely useful. You don't need AGI to get Siri to not be completely useless. Just get on with using AI properly already.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.