Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I really appreciated your post. Could you elaborate a bit more on the point above? I’m curious to hear how you see things taking a very bad turn.
OpenClaw is just scaffolding, I don't think the developer did anything truly novel or 'frontier-level' in the way that OpenAI or Anthropic or Google would consider it.

I think it was more about optics and possibly hiring the developer vs. actually gaining novel technology. Agentic workflows are tricky and the 'move fast and break things' doesn't work very well when these tools have access to your raw filesystem or credit cards, and scams (likely created by actual humans, NOT AIs) were all over the place within a day or two on those agent-to-agent platforms that were in the news as a hot thing a couple weeks ago "the AI is alive and they're scheming" kind of energy. They're heuristic systems talking to each other which is VERY COOL in an ant colony way but when these people can't even get MCP working with high security I don't think there's any chance of Agent-to-Agent scaling to some kind of massive scale soon. If it was a 'hire and do this in 2-3 years' thing, yeah, but I suspect very much it isn't – exactly because it's in the public consciousness.

I'm basing this on personal experience viewing them, this is also how I know the "AI spun up a blog to slag on a developer who wouldn't accept a pull request" was BS, because the blog itself had brackets like [TOPIC] all over the place that a human clearly set up. The media ran with it irresponsibly because it was a sensational story, and even Ars had to retract their article about it, although ironically that was because they used an AI to fabricate quotes.

Scaffolding is important when you're using multi-agents or even multi-workflows, I am probably going to spend some part of the middle / late part of my year building my own, but it's not research heavy, and that + strategy is I think something missing from OpenAI vs. Anthropic who has been nailing both product placement and execution, other than the voice issue I mentioned which IS a big deal to me.

Anthropic spent a while quietly getting integrated with corporate America and carefully made deals and demonstrated their power, OpenAI seems to be throwing everything at the wall to see what sticks and I think that strategy won't pay off since this field has very little in the way of 'moats', at least for now. I expect this will change to some degree with world models and it's why we have companies spinning up warehouses full of robots with cameras to just run experiments and train, which is really more RL than ML/AI. My pet theory is that RL has a lot more "there there" than we give it credit for at least in the public consciousness.

I'm also personally working on something involving what I'd call a substrate that I can't really talk about for now but seems promising, at least on paper. This is also why I'm particularly frustrated that I can't read research papers easily anymore because so many of them just measure banal garbage, and usually not very well. I don't care about AI assisted productivity because anyone who has deep experience will know there is fundamental utility and anyone who has limited experience and skill will have their "AI sucks, haha" validated, and the worst part is they are both correct.

These people have flooded the marketplace of ideas and LLM assisted writing has made this much worse. I used to read hundreds to thousands of papers a year and now I really have to pick and choose and I feel like I'm missing a lot because the noise to signal ratio is insane.
 
Apple should have bought Anthropic. From my experience, Claude is the best, at least at coding and complex reasoning.
Yes, but I think Apple would have sandboxed them and slowed innovation. Many employees probably would have left if Apple bought them unless Apple let the company operate very independently.
 
  • Like
Reactions: User 6502
How do you all determine what AI model to use for what? is there a good cheat sheet somewhere that explains what model is best for what?
 
  • Like
Reactions: freedomlinux
How do you all determine what AI model to use for what? is there a good cheat sheet somewhere that explains what model is best for what?
Practicing over time leads to developing an intuition. Claude has more proprietary data in its models for scans of written text. Gemini has the full power of Google search behind it. Chat GPT is overall very good at everything. The difference between the models outside of niche use cases is shrinking over time.
 
How do you all determine what AI model to use for what? is there a good cheat sheet somewhere that explains what model is best for what?
One thing to be aware of that not many people are is for ChatGPT (paying customers), you can only get extended thinking in the web app now... iOS and Mac Apps don't even have a UI for it anymore which is stupid.

Unless you're broke I would also suggest paying for the tooling so you can turn off training on your data / chats (which you can do for OpenAI and Anthropic).

Like @Zwhaler said, use is the only way to really get a feel for it, and knowing what to look for when the models go off the rails. Opus 4.6 just screwed up a recipe for example because it only had partial context from a previous chat I had, and I noticed it, but a lot of people might miss it and assume it works perfectly. You always have to pay attention when working with these things, but the utility (for me, and a lot of others) is still there for task tracking / research work / some coding / other stuff etc.

The main things I actively strongly dislike are the art replacement tools. There's not a lot of room for reasoning and there can't be novel thought so it's just a regression to the mean. With the LLMs there are enough layers and tool use etc. that you can kind of augment around their limitations and get something that is a pretty reasonable facsimile of a thinking partner to a certain degree.

For moderate to heavy users, until recently at least I would say Claude is only worth it if you pay for the $100 Max plan unless you're doing very light use, which might be the case. I have multiple ongoing chats and projects probably the majority of the week so I'd be out of usage every 30 minutes with the base plan, but someone very casually using it could probably get by with the $20/mo.

For ChatGPT I think the $20/mo plan is fine for almost everyone unless you really need pro reasoning. Codex is heavily subsidized right now and if you're a coder you can use codex-high in something like Zed for what seems like a nearly infinite amount of time which is just... insane and not sustainable.

I suspect Opus 4.6 has a more advanced routing model that defaults to less 'effort' unless your prompt determines it needs more, and this probably applies to Sonnet 4.6 too. Anthropic does have uptime issues for their consumer end and I think they tweaked this a little in the negative sense. Hopefully whenever the 5.x models come out it goes back to the way it was, but we'll see.

Sonnet is very good for writing code that is well defined and spec'd, but Opus is better at planning and collaborating. GPT just routes you however it wants now and there aren't 2 major models to pick, though I do see much better results using the web interface as I said above.
 
MiniMax 2.5 is a beast and delivers 99% of what Claude does... download "CC Switch" and you can swap out Claude CLI's LLM for MiniMax and get a like 20x more usage for the same money.
 
The developer burnout point from macduke is honestly one of the most important things in this whole thread and it doesn't get talked about enough. Companies scaling engineers to 10x output without reducing hours aren't gaining efficiency — they're just piling on cognitive load until people snap. The tools are genuinely great, but the way organizations respond to them has been pretty consistently extractive rather than thoughtful.


What's interesting is the thread actually captures two completely separate debates that keep getting tangled together. People who use Claude daily for coding — like Keymaster's practical "little bits to chew on" approach — are mostly happy with it and for good reason. The skepticism from others is really aimed at the hype machine and the ridiculous job-replacement rhetoric, not the tools themselves. Vibe coding crashing a production system is a workflow and judgment problem, not a Claude problem. The model did exactly what it was asked to do!


On Sonnet 4.6 — the Opus-level intelligence at a friendlier price point is genuinely exciting for teams running agentic workflows where inference costs add up fast. And the 1M token context window is the quietly big deal here; that's where you really feel the difference on large, complex codebases day to day. If you're juggling multiple Claude Code sessions and agent pipelines at once, Remocode is worth checking out — it sits alongside tools like Copilot and Cursor in a split-pane terminal with Telegram remote control, so you're not glued to your screen babysitting every session.


The voice routing quietly dropping to Haiku that novagamer flagged is worth following up on though — silently downgrading users to a weaker model without any heads-up is the kind of thing that quietly damages trust way more than any hot-take article ever could.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.