Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I came to the “AI” game a little late but have been using it the past month in my software development. It can be absolutely fantastic for simple things, but it often just flat out fails on complex questions. And, yes, the AI can give dramatically different answers with a slight change in wording. As an “assistant”, it’s pretty great most of the time, but I wouldn’t rely on an AI for anything critical.
 
I agree its consciousness level seems low, but these kinds of things are also things that kids get wrong too. When my kid was 4, she was super into math problems, and she’d ask me to give her problems like this all the time for fun. She’d make these exact kinds of mistakes. I’d add irrelevant information to see if she’d inadvertently incorporate it - for example, I’d ask, I went to the store and bought three apples, two sausages, and two bananas, how many fruits do I have? She'd often get questions like this wrong. So I’m not sure this necessarily proves anything other than it’s reasoning isn’t incredibly advanced.
 
Guess that kind of answers why AI sometimes thinks bananas has like 4 n’s
 
I wonder how many here have actually taken the time to read, absorb, grasp, and understand the study?

From some of the responses, I'm guessing one or two. Maybe.
 
I suspect these problems will become less and less relevant as the models improve. Also nobody really understand what goes on in our brains, so it could well be we are also very good pattern finders and that’s all our ‘reasoning’ is.
 
Claude solved this on first try so this news is a big nothing burger. Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

Actually ChatGPT 4o also solved it on first go, so what the hell? I actually ran this multiple times with both models and it never came out wrong. Did they run the test 10000 times until the AI tripped?
View attachment 2437141
Yeah, this is exactly how significance testing works. I haven't read the paper, but it is assumed that they would have tested a large number of queries, a large number of times, since that's how you approximate to the true result (law of large numbers). A published, peer reviewed study should also find statistical significance, meaning that it is extremely unlikely that chance alone could explain the result. I'd imagine they also tested different contexts surrounding the question itself (though maybe not), since a model that gets is right in isolation, but fails when there is additional—tangentially related, or unrelated—content in the context window, is clearly not using reason (which would remain consistent regardless of long-term context).
 
Claude solved this on first try so this news is a big nothing burger. Funny that the study says "the majority of models fail to ignore these statements". So there were models that worked fine but they only cherry-picked the worst ones? Smells of bias.

Actually ChatGPT 4o also solved it on first go, so what the hell? I actually ran this multiple times with both models and it never came out wrong. Did they run the test 10000 times until the AI tripped?
View attachment 2437141

Does this question exist elsewhere on the Internet? Yes it does. The conclusion is indexed.

LLMs absolutely cannot count. They can't do basic math because that's not what the algorithm is built for. They can do excellent pattern matching.
 
Can't wait to see people bring this as an example of AI lacking utility. They're Large Language Models for a reason.

This is more of an expectations issue than a tech issue.
 
I tried ChatGPT with the following question "why is 9 time 6 equals 42 a problem" and got the following result

"Saying that 9 times 6 equals 42 is a problem because it’s incorrect. The correct product of 9 and 6 is 54. Miscalculating can lead to errors in various contexts, such as math problems, financial calculations, or measurements. Accuracy is key in mathematics!:

When it says it's incorrect, that's not necessarily true, same for the answer being 54, it's not necessarily true. It's correct when it says it accuracy is key but so is completeness.

It made a fundamental assumption that I had fully specified the problem and was working in decimal, which was why it unequivocally said it was incorrect.

I wasn't I was using base 13 (where 13 is in decimal notation)

Decimal version: 9 (Decimal) X 6 (Decimal) = 54 (Decimal) , i.e. 5 X base + 4 -> 5X10 +4

Base 13 version: 9 (Base 13) X 6 (Base 13) = 42 (Base 13), i.e. 4 X base + 2 -> 4 X 13 +2

That the ultimate question is not expressed in decimal is why the universe is a strange and wondrous place.
 
If this surprises you, you've been lied to. Next, figure out why they wanted you to think "AI" was actually thinking in a way qualitatively similar to humans. Was it just for money? Was it to scare you and make you easier to control?
AI has mainly been the always shady tech industry trying to cash in on a new big thing. It is not like it is actually hard for large software makers to convince media and politicians that something is huge and deserves massive investment. Add in the threat that other countries (China, etc...) may beat the US to the new technology and you have the makings of a true bonanza.
 
LLMs have no reasoning ability. They're literally just next word guessing algorithms fed on the stolen works of others, and worse, public forum posts. There's no intelligence at work, at all.
It is the more modern version of early (and sometimes) modern pages on Wikipedia where people "source" something that has zero factual basis.

If the source of the data is garbage, the answer to the equation will be equally useless.
 
So Apple is proving Apple Intelligence is crap before it even releases? Funny.

But in all seriousness, I've had a lot of luck with the ChatGPT o1-preview model in particular. It is definitely a step up. What I like is that it breaks things down into parts and then goes through each part to show how it arrived at that conclusion. Then it's pretty easy for me to follow since it breaks down something complicated into something simple and I can check the work for each step and ensure it's not doing something weird.

It also nailed a fairly complicated bit of programming for me the other day in an emergency situation and knew all of the proper API calls to make to an external service we use. Allowed me to react several times faster to the issue by rapidly building out a tool to help resolve an issue across 800+ websites. Checked all the code, modified a couple bits to better suit the need, tested it on a few sites, and then mass deployed. Everything went great, but it's still a tool. It's good to understand the limitations of your tools, and some tools are more effective in the hands of one person vs. another, based on experience and skill level. I also do woodworking, and it's even more apparent in that field. Never rely too heavily on your tools, or on any one tool.
 
  • Like
Reactions: Mischievous_Surf
If this surprises you, you've been lied to. Next, figure out why they wanted you to think "AI" was actually thinking in a way qualitatively similar to humans. Was it just for money? Was it to scare you and make you easier to control?
It seems you might be the one with the skewed perspective. Do you use it very often?
 
It's funny to me that we act like if AI isn't "sentient" then that means it's complete BS. Why not just recognize what it is good at and stop with the unnecessary hype?

Because the money pushing it wants it be some HUGE ASTRONOMICAL PAYDAY

...not just the occasionally and contextually possibly useful small side feature it actually is

We live in a BS Hype world the likes of which have never before been seen

I hope a reversion to some modesty, honesty, hopeful vision and truth may return, but right now it just feels like everyone is trying to see what they can get away with in the name of "cashing out"

I mean even just look at Apple

Is anything they are doing really expressing a coherent and hopeful vision for the future and where we want to go as people and customers?

No -- it's just a huge rent extraction operation to juice the stock price

Their one "thing" of late (AVP) is an overpriced, underwhelming, hugely isolating and anti-social helmet that sales would indicate nobody even wants.
 
After multiple years of hype this seems like a rather glaring issue that would have been exposed long ago by even at home testers. Why are we just hearing about it now?
 
The problem with showing that some other model gets the correct answer is the same as we've always had: how do we know that we're getting a good answer? (Of course this isn't limited to AI) Opinions are everywhere. Curation is not. When we believe an answer, it's because of its reasonableness, the confidence we have in the answerer, and the supporting data provided (and its provenance). Have you ever read the studies quoted at the ends of articles about a drug's or an air purifier's effectiveness? About 40% of the time, the quoted study states something irrelevant or even opposite.

(And why should you believe me? 😉)
 
  • Like
Reactions: Chuckeee
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.