Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
LLM models like ChatGPT are great at certain tasks and I am using it now exclusively instead of the old method of "google searching" for answers on most things. It is able to summarize vast amounts of information in seconds that may have taken me hours to put together before. Examples like "what are the different kinds of magnesium supplements and their uses?" or "what does this windows/Mac error mean and how to fix it?" or it being able to write code for certain tasks - its very good at these kinds of things.

What it can't do (because it isn't true AI) and what we really want it to be able to do is take all of human knowledge and solve problems that are too complex for a human due to the sheer amount of data and math involved. Giving us vast leaps in technology in every area by bringing the theoretical into the real world. Solving problems like curing cancer, developing new kinds of propulsions, creating free energy generation or designing a teleportation device for objects and humans. This type of true AI is still decades off if it is possible at all.
 
There is some very clever tech out there undeniably, some in its infancy, some pure gimmicks (Apple, I'm looking at you).

'AI' is a marketing term. Nothing more, nothing less.

A lot of stuff that is branded AI is just clever coding with a huge information store to call on (arguably built largely on plagiarism).

'AI' has become a thing, largely because it's been heavily pushed, along with the lack of actual intelligence knocking about these days and the lazy mentality in people today.
 
This is important research. It will help identify areas to improve the functioning of the models.

Saying that, the models also sound entirely human. “The researchers' analysis of reasoning traces showed inefficient "overthinking" patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.”

That’s true for people as well. Having considerable experience with some of the tests used in the research, I can say that people will also "waste" computational budget and effort, even after having a correct solution.

This isn't to say the LLM models are reasoning, but we also need to be cautious in saying they are not. They might not reason exactly the same as many people, but it's also possible the reasoning is simply a different form of reasoning. Just like a dog might reason differently than a person, doesn't mean the dog is not reasoning (or fish or snail if you don't like the mammal analogy).
 
  • Like
Reactions: citysnaps
While they're questioning AI reasoning models, I've just built a fully functional web app w/ front-end + back-end (which I can also put under a subscription model) running in docker on a VPS, via ChatGPT, in around 100 prompts

I don't need AI to turn me into a "zebra" while it achieves singularity, not yet. I only need it to enhance my productivity, which it does 100x since I don't need a developer to build apps or maintain them for me ($10k-$100k's) and neither to spend months learning to code an app in some novel language or w/ a new library, debug it and lose too much sleep over it. And this is just one use case.

Apple is boring and they're lying a lot. Breakthroughs in tech, like those from OpenAI, Anthropic and others, are a great way to level the playing field and break up trillion-dollar companies that often turn into monopolies.

screencast 2025-03-20 14-59-07.gif
 
Last edited:
While they're questioning AI reasoning models, I've just built a fully functional web app (which I can also put under a subscription model.) running in docker on a VPS via ChatGPT in around 100 prompts

I don't need AI to turn me into a "zebra" while it achieves singularity, not yet. I only need it to enhance my productivity, which it does 100x since I don't need a developer to build apps or maintain them for me ($10k-$100k's) and neither to spend months learning to code an app in some novel language or w/ a new library, debug it and lose too much sleep over it. And this is just one use case.

Apple is boring

View attachment 2517421
Wait until people are saying they don't need your app or subscription model, because.. you know... AI.
 
Sour grapes. 'Apple's research team'? Shouldn't they be working to improve things instead of attacking successful companies.

And they spent resources to prove something obvious?

And the timing of this. My goodness.
 
While they're questioning AI reasoning models, I've just built a fully functional web app (which I can also put under a subscription model.) running in docker on a VPS via ChatGPT in around 100 prompts

I don't need AI to turn me into a "zebra" while it achieves singularity, not yet. I only need it to enhance my productivity, which it does 100x since I don't need a developer to build apps or maintain them for me ($10k-$100k's) and neither to spend months learning to code an app in some novel language or w/ a new library, debug it and lose too much sleep over it. And this is just one use case.

Apple is boring

View attachment 2517421
This guy gets it. The current state is still incredibly powerful even if it isn’t truly “reasoning”. AI is great at stuff like front end development (which was always mucking around with a bunch of crap languages anyway).
 
This is very odd move by Apple - "we cannot do LLM so let's discard it". You can check review of this "Apple research" paper and see how deeply it is flawed. Basically this only proves that Apple is the worst among others in AI race and they do not understand a thing...
 
  • Disagree
Reactions: GMShadow

Good post examining one of the apple puzzles and Deepseek’s responses.
 


A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI "reasoning" large-language models like OpenAI's o1 and Claude's thinking variants, revealing fundamental limitations that suggest these systems aren't truly reasoning at all.

ml-research-apple.jpg

For the study, rather than using standard math benchmarks that are prone to data contamination, Apple researchers designed controllable puzzle environments including Tower of Hanoi and River Crossing. This allowed a precise analysis of both the final answers and the internal reasoning traces across varying complexity levels, according to the researchers.

The results are striking, to say the least. All tested reasoning models – including o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet – experienced complete accuracy collapse beyond certain complexity thresholds, and dropped to zero success rates despite having adequate computational resources. Counterintuitively, the models actually reduce their thinking effort as problems become more complex, suggesting fundamental scaling limitations rather than resource constraints.

Perhaps most damning, even when researchers provided complete solution algorithms, the models still failed at the same complexity points. Researchers say this indicates the limitation isn't in problem-solving strategy, but in basic logical step execution.

Models also showed puzzling inconsistencies – succeeding on problems requiring 100+ moves while failing on simpler puzzles needing only 11 moves.

The research highlights three distinct performance regimes: standard models surprisingly outperform reasoning models at low complexity, reasoning models show advantages at medium complexity, and both approaches fail completely at high complexity. The researchers' analysis of reasoning traces showed inefficient "overthinking" patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.

The take-home of Apple's findings is that current "reasoning" models rely on sophisticated pattern matching rather than genuine reasoning capabilities. It suggests that LLMs don't scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

The timing of the publication is notable, having emerged just days before WWDC 2025, where Apple is expected to limit its focus on AI in favor of new software designs and features, according to Bloomberg.

Article Link: Apple Research Questions AI Reasoning Models Just Days Before WWDC
Amazing that the bosses are convinced that they will replace us all with this.
 
  • Like
Reactions: zarmanto
So AI is nothing more than clever programing?

Yea. I remember taking a seminar in pattern recognition many years ago, AI mostly seems a more powerful version of what we ran on big iron.
This isn’t a surprise to users. However, it doesn’t mean the reasoning models aren’t very helpful. They still generate huge productivity gains. There is just a learning curve to feeding and instructing the problem in a digestible way - people advanced in their fields can do this while eliminating the need for junior analysts.

I’ve used it to help with several web tools I’m developing. It’s good a suggesting and explaining code. Although at times the code is right but the logic flow faulty, and a prompt suggesting changed elicits a ‘You’re right …’ response I find humorous. Other times it suggests depreciated functions, probably because they show up more often in its training data and thus appear correct. Google quickly tells me the function is depreciated. AI is good as a support tool but has limitations.
 
Everyone complaining should just read the paper, it isn’t very long, but also read the sources

I’m glad this is getting traction, LLMs and LRMs have some narrow uses but they aren’t generalized and especially aren’t suited for large complex tasks but there is a ton of financial incentive for big tech to make everyone think they are going to get there sooner or later which is not at all a foregone conclusion.

Their entire point is that the path to a generalized tool is unclear, and good on them for saying so. It doesn’t mean that Apple won’t leverage LLM technology.

Model collapse is real and matters, and it’s not a simple problem of.additional training or a larger context window.

Yann is right, we need world models not to bolt on more and more infrastructure to this technology.

Smart people and money are already aware of this, but it will be a long time until upper management and the public understand it due to the extraordinary marketing success and implied usefulness / anthropomorphic perception of this current actually very limited technology.

“Hallucinations” being socialized as the term for zero ground truth and confidence is probably the most genius marketing move of the last 5 yards.

We need things like this to help get the industry back to earth and judge the technology on objective merits. It should never have left research labs, but we are where we are and there’s no putting the cat back into the bag.

Both people who think current technology has zero use (which this paper actually explicitly refutes!) and people who think LLMs or LRMs will inevitably scale to truly high complexity are sadly ignorant.
 
Last edited:
  • Like
Reactions: dmi
It always amazes me to see there are still people claiming the LLMs are just a gimmick. No, they are world changing efficient at many tasks. ChatGPT has completely changed my work flow and eliminated my need for junior employees & outsourcing some key elements in my work.

The question is not whether LLMs are better than experts. It’s whether they are better than the average junior employee or middle manager.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.