OpenAI Brings Faster, Smarter GPT-5 Model to ChatGPT Users

GrassShark · Aug 7, 2025

novagamer said:
Images mapped with text is a known limitation of generative image creation, GenAI is terrible for this across the board and they have their own sub-experts to even get text spelled correctly which still can fail sometimes. There's a long way to go there, I'd never expect them to make a map right now.

That said, it does get the number of states with "R" in the name correct. People have a vested interest in engagement and clickbait nonsense because GenAI has a lot of emotion around it. This is why I advise everyone to run their own tests and use all models available if they are pertinent for your work or adjacent to your interests.

Not everyone needs to use these of course, but for those that want to, they should be informed. There is a lot of misleading information around and things like that keep people with the belief that the technology is where it was 2 years ago which isn't true.

No one should trust what I or anyone else says in here fully, watching the live demos and trying them for yourself is the only way to know how. Reading Simon Willison's blog to understand usage tips and staying current with development also helps.

There is a lot of "there" there without being reductive. But it's easy to take pot shots and snipe at each other because this technology is undeniably disruptive and that makes a lot of us understandably uncomfortable.

I gave GPT 5 the same prompt and it told me 22. Apparently there's an R hiding somewhere in New Mexico.

Setting aside the incredible waste of resources and damage to the environment, these things are inherently unreliable. It's not something these companies can fix because it's the nature of the tech they're built on. It's easy to take potshots at these things because they mess up all the time.

novagamer · Aug 7, 2025

GrassShark said:
I gave GPT 5 the same prompt and it told me 22. Apparently there's an R hiding somewhere in New Mexico.

Setting aside the incredible waste of resources and damage to the environment, these things are inherently unreliable. It's not something these companies can fix because it's the nature of the tech they're built on. It's easy to take potshots at these things because they mess up all the time.

Did you use the exact prompt I did? Asking it to think is critical. These types of specific knowledge tasks work much better if you either add in web search (e.g. for asking about some recent documentation on a an API update, "pull from x website where the new docs live then answer my question") or if you activate MoE / CoT / ' Reasoning' via 'thinking'.

You can write it off but it gets it correct for me on multiple attempts.

Grok 4 also got it right on the first try for me just now, and activated 'thinking' on its own.

For some questions you don't need to use thinking, for some you do. This stuff requires a bit of know-how regarding how to prompt and what level of effort to tell the model to use. That may make it less useful for people that don't know that, but it doesn't mean the capability isn't there.

It does make mistakes of course, especially if you don't meta-prompt well or anchor to something solid, but there is a lot of utility in what's there. These quick tests aren't as useful as actually seeing if it integrates with your workflow in a positive or negative way.

For me, Claude Opus 4 and now 4.1 is indispensable for my work and assistance with research, but I know enough what to check and am careful with my prompting. It seems like GPT-5 will be useful also which is great because I didn't get along well with 4o, but I will miss 4.5 because I threw some really long research tasks into that and got excellent results.

I'm really most excited for the competition to heat up now and push Anthropic to get their pricing down and advance their rollout. I hope they continue to stay afloat for a long while.

edit: for clarity, you need paid accounts to use the features I'm talking about above. Free versions will not offer these tools, OpenAI was very specific about the level of model being offered to the free tier. It's a big step up in quality to pay the $20, subject to usage quotas of course.

GlassFingers · Aug 7, 2025

ChatGPT is so good - I feel a little guilty only paying $20 per month lol

GrassShark · Aug 7, 2025

novagamer said:
Did you use the exact prompt I did? Asking it to think is critical. These types of specific knowledge tasks work much better if you either add in web search (e.g. for asking about some recent documentation on a an API update, "pull from x website where the new docs live then answer my question") or if you activate MoE / CoT / ' Reasoning' via 'thinking'.

You can write it off but it gets it correct for me on multiple attempts.

Grok 4 also got it right on the first try for me just now, and activated 'thinking' on its own.

For some questions you don't need to use thinking, for some you do. This stuff requires a bit of know-how regarding how to prompt and what level of effort to tell the model to use. That may make it less useful for people that don't know that, but it doesn't mean the capability isn't there.

It does make mistakes of course, especially if you don't meta-prompt well or anchor to something solid, but there is a lot of utility in what's there. These quick tests aren't as useful as actually seeing if it integrates with your workflow in a positive or negative way. For me, Claude Opus 4 and now 4.1 is indispensable for my work and continuing education, but I know enough what to check and am careful with my prompting.

I copied the exact prompt from the screenshot. But either way, does the fact that you have to instruct it to "think carefully" for such a simple task not raise a big red flag for you? Should it not be thinking carefully for every task, if that's required to make it do the job properly?

I ran the exact same prompt again in a new chat and it gave me the right answer this time, which again goes to the point of this stuff being unreliable.

It's just insanely underwhelming knowing how much money has been flushed away on this.

novagamer · Aug 7, 2025

GrassShark said:
I copied the exact prompt from the screenshot. But either way, does the fact that you have to instruct it to "think carefully" for such a simple task not raise a big red flag for you? Should it not be thinking carefully for every task, if that's required to make it do the job properly?

I ran the exact same prompt again in a new chat and it gave me the right answer this time, which again goes to the point of this stuff being unreliable.

It's just insanely underwhelming knowing how much money has been flushed away on this.

As one example, right now I'm learning some facets of a programming language I'm not intimately familiar with, and I'm using Opus as an aid to a lecture series that makes some mistakes in pedagogical class ordering. I'm using it kind of like a TA, not using to write code or autocomplete my work but to help explain some poorly documented APIs and Packages, as well as usage questions when something I do happens to work that I didn't expect, or vice versa.

It's been easily as valuable as a real tutor or moderate-tier pair programmer would be to me, and it's far cheaper, always available, etc. I don't expect it to write programs in their entirety for me, nor would I want it to. I also don't expect to offload my own mental tasks or truly deep-thought knowledge work to these tools either, I need to stay sharp to earn income and continue to excel in my profession.

But both things can be true with these tools now, they are flawed, but can still be quite useful. Claude code is used by a lot of people with great results. I don't have a workflow I can integrate it with at the moment, but down the road when company policies change and the technology advances I can see that being different. Understanding usage patterns and best practices is, to me, beneficial, because I can speak with some authority on the capabilities and what I'm getting out of the models.

I don't think the utility is in asking it to count the number of states with "r" in the name. This sounds like a pithy reply but I don't mean it that way. An additional add-on to that would be to say something like "if you can't find utility in these tools, the problem is with you, not the tools." I don't agree with that statement in its entirety, but there is something there, at least in my experience.

I fully expect LLMs will be mostly obviated within ~5 years, world models should replace them entirely but the real timeline could be anywhere from 2028-2035 all the way to never if the technology fails to materialize. But for now, as of this year, they are pretty useful in broad areas, at least to me. Your mileage may vary.

Boeingfan · Aug 7, 2025

Love ChatGPT. Life is great in the 21st. They may one day rule the world and won’t need fake tan to do it 😁

GrassShark · Aug 7, 2025

novagamer said:
As one example, right now I'm learning some facets of a programming language I'm not intimately familiar with, and I'm using Opus as an aid to a lecture series that makes some mistakes in pedagogical class ordering. I'm using it kind of like a TA, not using to write code or autocomplete my work but to help explain some poorly documented APIs and Packages, as well as usage questions when something I do happens to work that I didn't expect, or vice versa.

It's been easily as valuable as a real tutor or moderate-tier pair programmer would be to me, and it's far cheaper, always available, etc. I don't expect it to write programs in their entirety for me, nor would I want it to. I also don't expect to offload my own mental tasks or truly deep-thought knowledge work to these tools either, I need to stay sharp to earn income and continue to excel in my profession.

But both things can be true with these tools now, they are flawed, but can still be quite useful. Claude code is used by a lot of people with great results. I don't have a workflow I can integrate it with at the moment, but down the road when company policies change and the technology advances I can see that being different. Understanding usage patterns and best practices is, to me, beneficial, because I can speak with some authority on the capabilities and what I'm getting out of the models.

I don't think the utility is in asking it to count the number of states with "r" in the name. This sounds like a pithy reply but I don't mean it that way. An additional add-on to that would be to say something like "if you can't find utility in these tools, the problem is with you, not the tools." I don't agree with that statement in its entirety, but there is something there, at least in my experience.

I fully expect LLMs will be mostly obviated within ~5 years, world models should replace them entirely but the real timeline could be anywhere from 2028-2035 all the way to never if the technology fails to materialize. But for now, as of this year, they are pretty useful in broad areas, at least to me. Your mileage may vary.

To be clear, I'm not saying I never use LLM tools. I've made real efforts to try use them for all sorts of tasks with very mixed results. I'm changing careers and are part of my initial retraining I used various LLMs to quiz me on stuff I had studied. Quite frequently, I would find that the LLM would pose decent questions for me, but then proceed to get the answer to its own question wrong.

I've also used them for things like spreadsheet formulas, but again they often mess these up, formatting things for the wrong programs even after I specify what I want it for.

And finally, importing timetables into calendars — something that should be simple, right? It accurately takes text from a photo or image, formats it for an ical file, and then messes up at the last step of creating a text file, with every attempt to correct it making the problem worse.

And yeah, asking stuff like how many Rs in state names etc isn't exactly a key use, but it's such a basic task that if it can't do that I have to wonder what else it can't do. How can I trust it to do complex tasks if it can't reliably do extremely simple ones?

I'm sure if I paid for the pro versions of these tools and really, really tried, I could get them doing more stuff and save 5 mins here or there, but frankly until I can trust them to not mess things up all the time, why would I bother? Just like with Siri, I find that the in time it takes me to ask it to do the thing (and then re-ask and re-clarify), then verify it's done it right, I could usually have just done the thing myself, knowing I'd done it right and done it the way I want it done.

It seems that coding and coding related tasks may be the only area where these things are actually somewhat good.

Techwatcher · Aug 7, 2025

“OpenAI says that GPT-5 is its best coding model to date”

Obvious thing for OpenAI to say. Why even say it? Why would GPT-5 be worse than GPT-4? So dumb.

germanbeer007 · Aug 7, 2025

zakarhino said:
Grok is an industry joke, nobody takes it seriously because it's benchmark-maxxed junk. Claude models are not always the top benchmark performers but everyone swears by them for coding and real life use cases.

"everyone" I literally had grok 4 fix problems Claude Sonnet 4 couldn't.

germanbeer007 · Aug 7, 2025

throAU said:
Yes.

Tesla withheld data, lied, and misdirected police and plaintiffs to avoid blame in Autopilot crash

Tesla was caught withholding data, lying about it, and misdirecting authorities in the wrongful death case involving Autopilot that it...

electrek.co

lol Electrek.
frederic Lambert sold his stock (which he bet the house on at one point, literally) and is mad it's 1.5x the value now. he's trying desperately to bring back the stock down so he can get back in. plenty of provably wrong articles he's written in the past.

svish · Aug 7, 2025

Going to use it and see how it has improved. Nice to see that it is available for all users.

zakarhino · Aug 7, 2025

pugxiwawa said:
That’s just plain wrong. Please don’t say something that’s obviously not true without any fact backing it up. You don’t like Grok for whatever personal feeling, fine, but it is far from an industry joke. An industry joke doesn’t get valued at 200 Billion. Come on now.

"An industry joke doesn’t get valued at 200 Billion"

You just exposed yourself. We are in the biggest tech bubble in history and you're quoting a private valuation for an Elon Musk company lol. If it didn't have a massively inflated valuation there would be something massively wrong and we should all be panicking.

Anyway, go to any SF AI event or conference and ask everyone this question: "What models are you guys using?" and you will almost NEVER hear "Grok"

zakarhino · Aug 7, 2025

germanbeer007 said:
"everyone" I literally had grok 4 fix problems Claude Sonnet 4 couldn't.

Sorry but that means nothing. I see the occasional Grok success anecdote on Twitter which is meaningless in an ocean of people swearing by the other models such as Claude. You just said it yourself, you were using Sonnet 4 and ran into a bug it couldn't crack... so you were using Sonnet 4 like almost everyone else. Most engineers I've spoken to in your situation switch to Opus 4, o3, or Gemini 2.5 Pro. Again, almost nobody is using Grok for coding and I have yet to find a single example of people using it for any professional production use case.

It's nothing personal, I've actually paid for SuperGrok (or whatever it's called) in the past and thought it was mostly junk 🤷 And I gave it an earnest go when Grok 4 came out and... it's just meh. It's really embarrassing how the benchmark to real world capability ratio is so garbage, that's why everyone thinks it was tuned for benchmarks and not practical use.

Consider that Anthropic's models are never benchmark topping yet they are very obviously better at agentic workflows (especially coding) than basically anything else. Everyone knows Claude is King for coding, but perhaps GPT 5 will change that -- Grok 4 certainly didn't.

User 6502 · Aug 7, 2025

MrAR said:
What about quantum people like Robert Penrose saying something like intelligence equals being able to be conscious. Then I guess the question is what defines consciousness? According to him consciousness arises from non computational processes… meaning it can’t come from a computer.

Not sure if I agree with that but would be interesting to hear your definition of consciousness…

Intelligence and consciousness are different concepts, the hint is in the fact that they are two different words with different definitions. That being said if consciousness arises from non computational processes then we’re not conscious either as our brain is just a biological computer using neurons instead of copper and transistors.

freezelighter · Aug 7, 2025

So where is Sora 2? And where is the new image generation algorithm in GPT-5?

johnsawyercjs · Aug 8, 2025

turbineseaplane said:
View attachment 2535523

Brings to mind this:

View attachment 2535524

The other day I asked ChatGPT 4o if Pink Floyd ever revealed what "No one flies around the sun" means, in the lyrics of the first half of their track "Echoes", though I already knew the answer (they haven't). Rather than just saying yes or no, ChatGPT answered by initially quoting the lines from the second half of the song, then started to pour out some of the lines from the first half of the song, but then interrupted itself before showing the line in question, and said, "So, this shows that the line 'No one flies around the sun' doesn't appear in this song." I told it that it was wrong, but it insisted it was right. I went around with it a few times until it finally admitted I was right, and then accurately said "No they haven't". Then it said it couldn't really answer questions about song lyrics since it claimed it wasn't designed to quote full lyrics from songs, due to copyright issues (I'd never seen it admit that copyright is a concern for it). But that doesn't really explain its "answers" to my question about just a single line from the song.

I've had a fair number of such "experiences" with ChatGPT about a variety of other questions. Despite its frequent successes, it's still not ready for full-time prime time. The same is almost certainly true for all the others.

Cod3rror · Aug 8, 2025

chelsel said:
Do people still use ChatGPT lol? Claude is king!

Can you use Claude without registering and logging in? Cause that's a massive advantage of ChatGPT.

I wanted to use DeepSeek, but since it requires an account, so I have to log in every time (I have my browser set to delete cookies on close), I don't bother.

Cod3rror · Aug 8, 2025

User 6502 said:
How do you define intelligence and why do you think these model do not qualify?

The ability to gain new insight and wisdom from a new encounter or experience. The ability to recognise, analyse, and instinctually approach a certain task. I suppose though that instinct can be argued is a model.

If you get ChatGPT stuck, it will parrot the same reply over and over. It can't recognise that it needs to change it's approach in order to solve the problem of not what it's asked, but who is asking it.

Cod3rror · Aug 8, 2025

Plutonius said:
I'm curious who will eventually win the AI war.

Hopefully no one.

cupcakes2000 · Aug 8, 2025

User 6502 said:
And how can you be so sure that humans don’t do exactly the same thing these AI do (just better -for now-)? How can you tell the difference between working out the correct answer and intelligently answering a question? As long as the answer is correct and the reasoning sound what’s the difference?

I never said humans couldn’t. Of course they can. Intelligence in LLM respect is simply answering the question correctly. Artificial in LLM respect is the none human factor. It’s not ‘real’ artificial intelligence, as in self aware and able to reason independently like (most) humans, but it is still an intelligence which one can access and make legitimate use of.

wanha · Aug 8, 2025

Plutonius said:
I'm curious who will eventually win the AI war.

who won the internet war?

Will Co · Aug 8, 2025

User 6502 said:
And how can you be so sure that humans don’t do exactly the same thing these AI do (just better -for now-)? How can you tell the difference between working out the correct answer and intelligently answering a question? As long as the answer is correct and the reasoning sound what’s the difference?

So, the brain weighs about 3lbs and runs mostly on alcohol, coffee and porridge (just me?). As opposed to mind-bogglingly expensive, resource intensive farms of GPUs/CPUs that struggle to get close to the truly imaginative, intelligent, free thinking, independent, creative beings that we are. I can take my brain with me wherever I go, without needing a data connection or small power station to keep the lights on. I take a holistic view. We've a lonnnnnngggg way to go yet. For now, I'd settle for "answer is correct and the reasoning sound".

ItchyRat2160 · Aug 8, 2025

Utterly unhinged that someone can write the sentence "GPT-5 is less likely to lie to the user" without any kind of comment or concern about that. On the contrary, this whole piece reads like a corporate press release.

Kazgarth · Aug 8, 2025

Lol all this hype and long months of development only to fall short to Grok 4 which was released earlier and took way shorter time to train.

JulianL · Aug 8, 2025

dcingie said:
Can we somehow make chat gpt our default Siri?

The problem is that at least in my experience these LLM systems still fail badly at some of the basics I want for a smart assistant. I think this might be one reason why Apple and Amazon are having such difficulty in bringing LLM technology to their Siri and Alexa smart assistants.

There's a standard question I ask because it's an example of one of my fairly common queries relating to travel that I ask my smart speakers. I just asked ChatGPT "What time is the next train from <my local train station> to <a major London train station>?". That major London station is on a direct line from my local station so in all cases its only a single train journey.

In fairness ChatGPT did better than it has done in the past. In the past it would simply invoke a route planner to get from my current destination to the London terminus and then tell me when I needed to leave my home to walk to the bus stop, catch a bus to my local station (which I never do, it's an 8 minute walk), and then catch the train. In the past it then went on to tell me that if I wanted the exact times of the trains I should go to the web site of my local train company and look up the timetable there.

Anyway, today ChatGPT actually recognised the precise intent of my question and gave a direct answer, that the next train was at 10:04. It even gave me the departure platform and gave times and platforms for another 3 trains after that one. Useful if I missed the first train on the list.

The problem is that it also answered my follow-up question correctly. "What time is it now?". It gave me the correct answer - 10:41" So the "next train" it gave me had already departed 37 minutes ago and in fact the following 2 trains had also already departed at the time it gave me its answer.

To get a decent reliable user experience for a general purpose smart assistant there really does still seem to be a lot more work to do because I suspect this sort of total failure to give a correct & useful answer to a travel question is by no means the only edge case where current LMMs fail badly.

OpenAI Brings Faster, Smarter GPT-5 Model to ChatGPT Users

Suspended

macrumors 6502a

macrumors regular

Suspended

macrumors 6502a

macrumors 6502a

Suspended

macrumors 65816

Suspended

Suspended

macrumors P6

Contributor

Contributor

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors 68000

macrumors 68000

macrumors 68000

macrumors 601

macrumors 68020

macrumors 6502a

macrumors regular

macrumors 6502

macrumors 68000

Our Staff