Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Rabbit R1 can already do these things and more. ChatGPT is so behind on many things. I feel like it’s also become much more sensitive about user inquiries, as well. Things you could ask it in the past, it scolds you for it now and says it’s against policy. Rabbit and Grok are pretty much the least restrictive, along with rabbit being able to do all these tasks like a personal assistant
 
This is totally wrong.

Mathematics in education is one domain I am an expert in. It is definitely not suitable for that. It generates misleading methodologies, makes horrible mistakes and loses context very rapidly when structuring processes which require multiple steps.

The last thing you want teaching you is something that doesn’t know what it’s doing.

So please stop promoting this use case. It only looks like it works because you don’t know the domain well enough to know when it doesn’t work. And that’s dangerous - something authoritative sounding and wrong.

Very few areas of my existence require less than 100% correctness.

To which specific model does "It" refer? Khanmigo? MagicSchool? ChatGPT 4o? Other? Can you please cite some specific examples?

Fwiw, I've worked in K12 education for almost 20 years. I will absolutely, and cautiously, promote this use case (only as a supplement to human teaching) because I've seen it work. I refuse to make perfect the enemy of the good. An imperfect AI can still help a student make connections to what they learned in class when the teacher is unavailable at 10pm. And AI is only going to get better.

Generative AI is also a great way for teachers to tailor word problems to match the interests of each student, improving engagement. It does that very well. And with Google Gemini now being HIPAA compliant in Workspace for Education, teachers can now safely use tables of student first names and their interests without risking privacy. That helps teachers enhance their classwork without adding tedious labor.
 
Web/cloud based AI is indecently scrapping IP, personal data and content like it is the far west, goldmine fever. No controls, no boundaries. Probably worse than Meta.

I personally will never trust Musk or Meta platform for sharing any of my personal data. Call me crazy.

I am paying 20$ /m to OpenAI mostly because i can select not to share my conversations with them to improve their models (which btw might be a clever, vague term to allow them collect and sell your identity to third parties)

Apple Intelligence is totally lagging, probably because Apple is (or at least they sell it as) very good at guarding your privacy, and maybe there isn't yet a current LLM partner platform that aligns with Apple values in that matter.

I am a skeptical optimist in terms of privacy and IA. I am confident that somehow an AI that looks after your privacy will be a valuable thing in the near future.

edit:typos, grammar (sorry i am not aa native english speaking guy)
 
Last edited:
  • Like
Reactions: rnb2 and cjsuk
To which specific model does "It" refer? Khanmigo? MagicSchool? ChatGPT 4o? Other? Can you please cite some specific examples?

Fwiw, I've worked in K12 education for almost 20 years. I will absolutely, and cautiously, promote this use case (only as a supplement to human teaching) because I've seen it work. I refuse to make perfect the enemy of the good. An imperfect AI can still help a student make connections to what they learned in class when the teacher is unavailable at 10pm. And AI is only going to get better.

Generative AI is also a great way for teachers to tailor word problems to match the interests of each student, improving engagement. It does that very well. And with Google Gemini now being HIPAA compliant in Workspace for Education, teachers can now safely use tables of student first names and their interests without risking privacy. That helps teachers enhance their classwork without adding tedious labor.

I’m higher level (undergrad mathematics). We generally have to clean up the garbage taught before hand.

I’ve evaluated several models against several use cases including GPT-4.5.

A good one is a simple structured problem: “if I have to get up at 06:30 and need 7 hours and 30 minutes of sleep, then what time do I need to go to bed?”

This tends to produce a variety of hilariously awful approaches to solve the problem from breaking it down into minutes to do the calculation, being unable to understand the concept of a clock, totally and confidently miscalculating the answer and generally picking a suboptimal strategy for solving the problem. Attempts to formalise it even fail.

Handing that over to a student is pedagogically stupid.

As for the imperfection comment, hell no. Are you happy with 30% of what you teach being verifiably wrong? Because that’s what you just told me.

I’m sure I’ll get the defence now that “perhaps I’m asking the wrong question”. That’s part of the teaching. Making sure that what is being asked is understood. Which is no good if you throw a student who doesn’t know at it.

A good one a while ago was someone sent me a transcript for 4o’s proof of De Moivre’s theorem. It looked correct until you actually worked through it. I’d expect a student not to pick up why it was wrong. I’d similarly expect someone who doesn’t care about the inaccuracy to print it verbatim, hand it out as a worksheet and pretend you did a day of work.

Mathematics has rigour. Some other subjects less. That doesn’t mean it’s ok to get away with wrong.
 
Last edited:
  • Like
Reactions: Arran and 01cowherd
JFC......how lazy have we become as nation that we're relying on "artificial intelligence" to do simple tasks like this?
They know that the people that can’t do these things will signal boost “AI” as wonderful/fantastic/amazing, when it’s really just Theranos with a different scheme for money grubbing.
 
JFC......how lazy have we become as nation that we're relying on "artificial intelligence" to do simple tasks like this?

Yep. It’s just going to create future generation of people that barely think and rely on computers to come up with ideas. Pretty sad we’re going down this path.
 
Can it do things on a Mac or an iPhone? Does it have access to things like settings, being able to create calendar events, being able to access notes and emails and so forth? Is this what Siri was always meant to be?
 
I’m higher level (undergrad mathematics). We generally have to clean up the garbage taught before hand.

I’ve evaluated several models against several use cases including GPT-4.5.

A good one is a simple structured problem: “if I have to get up at 06:30 and need 7 hours and 30 minutes of sleep, then what time do I need to go to bed?”

This tends to produce a variety of hilariously awful approaches to solve the problem from breaking it down into minutes to do the calculation, being unable to understand the concept of a clock, totally and confidently miscalculating the answer and generally picking a suboptimal strategy for solving the problem. Attempts to formalise it even fail.

Handing that over to a student is pedagogically stupid.

As for the imperfection comment, hell no. Are you happy with 30% of what you teach being verifiably wrong? Because that’s what you just told me.

I’m sure I’ll get the defence now that “perhaps I’m asking the wrong question”. That’s part of the teaching. Making sure that what is being asked is understood. Which is no good if you throw a student who doesn’t know at it.

A good one a while ago was someone sent me a transcript for 4o’s proof of De Moivre’s theorem. It looked correct until you actually worked through it. I’d expect a student not to pick up why it was wrong. I’d similarly expect someone who doesn’t care about the inaccuracy to print it verbatim, hand it out as a worksheet and pretend you did a day of work.

Mathematics has rigour. Some other subjects less. That doesn’t mean it’s ok to get away with wrong.

GPT-4.5 isn't the right choice of model. It's optimized more for creative writing than logic and reasoning. 4o would be a better choice, but reasoning models like ChatGPT o3 have much better performance on math. I just put your example into Google Gemini 2.5 Pro and ChatGPT o3 and both nailed it in one shot (see attached). Even Qwen3-30b-a3b running locally on my Mac got it right (also attached).

The next thing I did was ask Gemini Pro, "What's the most pedagogically sound way to teach students how to solve this problem?" It's too long to paste here, but I recommend you try it and rate the output. I thought it was quite reasonable. Below is the summary section of the output I got.

Summary for Teachers
  1. Start with the Timeline. It's the most intuitive and builds confidence.
  2. Introduce Chunking. This promotes flexible thinking and strong mental math skills.
  3. Finish with Direct Subtraction. This connects the skill to formal algorithms and introduces powerful tools like the 24-hour clock for handling difficult subtractions.
By teaching all three methods, you cater to different learning styles and provide students with a versatile toolkit for solving any time-based problem.

"Are you happy with 30% of what you teach being verifiably wrong? Because that’s what you just told me."

That's not what I said at all. You may have interpreted it that way. It's not like I'm repeating the AI output verbatim to my kids. I read it to get a sense of what I may be missing. And then I use my own extensive math experience to fill in the gaps.

The power of AI isn't using it as the final arbiter of truth. It's in the ability for it to be an assistant to help along the journey to solving problems. It helps me explain math concepts to my kids better. The end goal is that they end up with a better understanding of math and learn how to ask the right questions on their way to being better solution finders.
 

Attachments

  • ChatGPT o3.png
    ChatGPT o3.png
    89.2 KB · Views: 16
  • Gemini 2.5 Pro.png
    Gemini 2.5 Pro.png
    211.9 KB · Views: 16
  • Qwen3-30b-a3b.png
    Qwen3-30b-a3b.png
    237.9 KB · Views: 15
GPT-4.5 isn't the right choice of model. It's optimized more for creative writing than logic and reasoning. 4o would be a better choice, but reasoning models like ChatGPT o3 have much better performance on math. I just put your example into Google Gemini 2.5 Pro and ChatGPT o3 and both nailed it in one shot (see attached). Even Qwen3-30b-a3b running locally on my Mac got it right (also attached).

The next thing I did was ask Gemini Pro, "What's the most pedagogically sound way to teach students how to solve this problem?" It's too long to paste here, but I recommend you try it and rate the output. I thought it was quite reasonable. Below is the summary section of the output I got.



"Are you happy with 30% of what you teach being verifiably wrong? Because that’s what you just told me."

That's not what I said at all. You may have interpreted it that way. It's not like I'm repeating the AI output verbatim to my kids. I read it to get a sense of what I may be missing. And then I use my own extensive math experience to fill in the gaps.

The power of AI isn't using it as the final arbiter of truth. It's in the ability for it to be an assistant to help along the journey to solving problems. It helps me explain math concepts to my kids better. The end goal is that they end up with a better understanding of math and learn how to ask the right questions on their way to being better solution finders.

And that’s the problem. Dunning-Kruger effect strikes again.

That’s a bad methodology and reasoning approach. A better approach is 6.5-7.5=-1 so 24+(-1)=23 thus bed at 11. Dealing with 30 minutes separately requires two sets of state to be held in short term memory whereas the method above requires two steps. Also the above method is scalable to needing 8 hours of sleep without adjusting the method.

This demonstrates my point: it’s a bad teacher who led you astray.

Regarding the teaching methodology, that’s not how we teach at all. It is an incredibly high level summary which does not cover domain specific concerns. Consider the difference between literary criticism and structured procedural knowledge like calculus. Each specialism has a different taxonomy and tree of knowledge and different methodologies and appropriate approaches.

I suspect some rote going on here.
 
This is something I'd enjoy researching for myself. This to me is an example of how AI can produce lazy minds devoid of deeper inquiry. Why bother finding out things for yourself when you can just have the computer tell you what to do? I guess it's all well and good -- if you trust the computer.

Human technology is an expression of mankind's values. It changes in degree what already exists. It won't produce lazy minds. It will allow lazy minds to get lazier. On the flip side, it will allow active, creative minds to expand. If you don't believe me, check out the way AI is becoming an integral part of the tool kit of professional mathematicians.

Consider the particular task, "Plan and buy ingredients to make Japanese breakfast for four", which you say you would enjoy researching for yourself. A lazy mind will just ask for the answer and follow it. An active, creative mind will ask for suggestions. Then ask follow up questions, taking interesting tangents and sidebar trips to linked information on the web.

It is always up to the user to decide what personal traits they will maximize or minimize with technology.
 
  • Like
Reactions: jayducharme
OpenAI, Grok, Gemini and others are trailblazing the new revolution while Apple focuses on Liquid Glass. 🤔

I think Liquid Glass is a step backwards and a waste of Apple's resources. That said, would you be happy if Apple decided to focus most of its resources on chasing the next tech bubble and let Macintosh, iPhone, iPad, MacOS, iOS, Apple Watch, et al. founder, only being updated every few years with little change? No semi-competent investor expects Apple to be riding the bleeding edge of the next tech wave.

Apple focuses on what makes Apple money in the short to medium term -- exactly as the stock market wants. OpenAI and others are on the bleeding edge, trying to create the next huge market, exactly as venture capital wants.
 
Human technology is an expression of mankind's values. It changes in degree what already exists. It won't produce lazy minds. It will allow lazy minds to get lazier. On the flip side, it will allow active, creative minds to expand. If you don't believe me, check out the way AI is becoming an integral part of the tool kit of professional mathematicians.
I guess this is another example:

https://techcrunch.com/2025/07/18/netflix-starts-using-genai-in-its-shows-and-films/
 
Which tool did you use? I just copy pasted your request to Gemin 2.5 pro and immediately got this:

View attachment 2529635

Put a ruler up to the screen and measure the actual width of each depiction. They're the same width to the nearest millimeter.

Now, since this is a side-by-side comparison diagram, wouldn't it be more useful if they were to scale? Right now it looks like 4ft 6in is the same as 5ft 6in.

Task was completed, but not particularly well.
 
I was going to watch the video you posted, and my eye fell on one of the top comments: 'Please make clear in the video title that you didn't actually test the thing. I don't like wasting time.'

So I find it a bit ironic that you remind everyone that 'facts are stubborn things,' and then post a video without much substance, matey :)

This is confirmation bias, you look for something that agrees with your hypothesis, yet the facts still remain there that the benchmarks ran by reputable sources display the truth that Grok 4 exceeds the performance of other models.
 
I’ve evaluated several models against several use cases including GPT-4.5.
Something I've noticed is that, whenever someone states a shortcoming in LLMs, the response is often you're holding it wrong you're using the wrong model/you're prompting incorrectly.

It looked correct until you actually worked through it.
This is my experience every time I've tried using LLMs for even simple tasks. There will be subtle (and sometimes obvious) errors, which require more effort to review and fix than doing if I were to do it from scratch.
 
  • Like
Reactions: cjsuk and Arran
This is confirmation bias, you look for something that agrees with your hypothesis, yet the facts still remain there that the benchmarks ran by reputable sources display the truth that Grok 4 exceeds the performance of other models.
Rejecting a “source” that’s not a source is not confirmation bias.
 
This is my experience every time I've tried using LLMs for even simple tasks. There will be subtle (and sometimes obvious) errors, which require more effort to review and fix than doing if I were to do it from scratch.
Same here. It’s the subtle errors that were taking me the most time to find and fix.

For now, I’ve given up using AI for detailed work that must be 100% correct.
 
  • Like
Reactions: cjsuk and 01cowherd
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.