Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I mostly use LLMs for coding and automation. Claude Sonnet 4 and Opus 4 have been top of the heap for a lot of it. It seems the o* models have been situationally better. The Chinese Qwen3-coder and Hunyuan have made surprising strides at parity too. GPT-5 has only been out for a few hours, but it seems to have the edge over most or all on various benchmarks. We’re on a fast train to somewhere, maybe a new level, maybe a wall, maybe a corporate retention pond. Everything points to massive changes to the knowledge worker industry, and I'm saying that from the inside.
 
If this was the best of the best, why did OpenAI/Sam desperately raise cash right before releasing ChatGPT 5?

Generally companies raise cash *after* releasing something good. Sounds like Sam got scared of Elon's initiatives with xAi's Grok 4.

EDIT:

Yep:

Grok is an industry joke, nobody takes it seriously because it's benchmark-maxxed junk. Claude models are not always the top benchmark performers but everyone swears by them for coding and real life use cases.
 
  • Like
Reactions: hagar
1754615456626.png


Brings to mind this:

1754615487456.png
 
Just needs to think on it a little bit.

I miss GPT 4.5 :( I really really liked it for specific tasks, even at the 30/mo limit.

GPT 5 requires a lot of pre prompting and meta prompts to give me similar quality. I suspect the model is a fraction of the size of 4.5 which is understandable from a cost perspective but annoying as paying subscriber.
 

Attachments

  • blueberry.png
    blueberry.png
    53.2 KB · Views: 13
Grok is an industry joke, nobody takes it seriously because it's benchmark-maxxed junk. Claude models are not always the top benchmark performers but everyone swears by them for coding and real life use cases.
That’s just plain wrong. Please don’t say something that’s obviously not true without any fact backing it up. You don’t like Grok for whatever personal feeling, fine, but it is far from an industry joke. An industry joke doesn’t get valued at 200 Billion. Come on now.
 
Let's definitely bet society on this stuff..

/s

View attachment 2535529
Images mapped with text is a known limitation of generative image creation, GenAI is terrible for this across the board and they have their own sub-experts to even get text spelled correctly which still can fail sometimes. There's a long way to go there, I'd never expect them to make a map right now.

That said, it does get the number of states with "R" in the name correct. People have a vested interest in engagement and clickbait nonsense because GenAI has a lot of emotion around it. This is why I advise everyone to run their own tests and use all models available if they are pertinent for your work or adjacent to your interests.

Not everyone needs to use these of course, but for those that want to, they should be informed. There is a lot of misleading information around and things like that keep people with the belief that the technology is where it was 2 years ago which isn't true.

No one should trust what I or anyone else says in here fully, watching the live demos and trying them for yourself is the only way to know how. Reading Simon Willison's blog to understand usage tips and staying current with development also helps.

There is a lot of "there" there without being reductive. But it's easy to take pot shots and snipe at each other because this technology is undeniably disruptive and that makes a lot of us understandably uncomfortable.
 

Attachments

  • states.png
    states.png
    74.6 KB · Views: 15
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.