Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster


A reporter for The Washington Post has put ChatGPT's new optional Apple Health integration feature to the test by feeding it ten years of their Apple Watch data. The results were not encouraging, to say the least.

ChatGPT-Health-Integration-Connectors-Feature.jpg

Earlier this month, OpenAI announced the launch of ChatGPT Health, a dedicated section of ChatGPT where users can ask health-related questions completely separated from their main ChatGPT experience. For more personalized responses, users can connect various health data services such as Apple Health, Function, MyFitnessPal, Weight Watchers, AllTrails, Instacart, and Peloton.

ChatGPT Health can also integrate with your medical records, allowing it to analyze your lab results and other aspects of your medical history to inform its answers to your health-related questions.

With this in mind, reporter Geoffrey Fowler gave ChatGPT Health access to 29 million steps and 6 million heartbeat measurements from his Apple Health app, and asked the bot to grade his cardiac health. It gave him an F.

Feeling understandably alarmed, Fowler asked his actual doctor, who in no uncertain terms dismissed the AI's assessment entirely. His physician said Fowler was at such low risk for heart problems that his insurance likely wouldn't even cover additional testing to disprove the chatbot's findings.

Cardiologist Eric Topol of the Scripps Research Institute was likewise unimpressed with the large language model's assessment. He called ChatGPT's analysis "baseless" and said people should ignore its medical advice, as it's clearly not ready for prime time.

Perhaps the most troubling finding, though, was ChatGPT's inconsistency. When Fowler asked the same question several times, his score swung wildly between an F and a B. ChatGPT also kept forgetting basic information about him, including his gender and age, despite it having full access to his records.

Anthropic's Claude chatbot fared slightly better – though not by much. The LLM graded Fowler's cardiac health a C, but it also failed to properly account for limitations in the Apple Watch data.

Both companies say their health tools aren't meant to replace doctors or provide diagnoses. Topol rightly argued that if these bots can't accurately assess health data, then they shouldn't be offering grades at all.

Yet nothing appears to be stopping them. The U.S. Food and Drug Administration earlier this month said the agency's job is to "get out of the way as a regulator" to promote innovation. An agency commissioner drew a red line at AI making "medical or clinical claims" without FDA review, but ChatGPT and Claude argue they are just providing information.

"People that do this are going to get really spooked about their health," Topol said. "It could also go the other way and give people who are unhealthy a false sense that everything they're doing is great."

ChatGPT's Apple Health integration is currently limited to a group of beta users. Responding to the report, OpenAI said it was working to improve the consistency of the chatbot's responses. "Launching ChatGPT Health with waitlisted access allows us to learn and improve the experience before making it widely available,” OpenAI VP Ashley Alexander told the publication in a statement.

Article Link: ChatGPT's Apple Health Integration Flaws Exposed in New Report
 
I remember how hard FDA went after 23andMe for making predictions based on DNA tests without offering counseling and all that, but now it's fine when a chatbot just does 'spicy autocomplete'* on your data.

*term seems ever more apt with the wide score variations
 
More reasons to avoid using the health app. I see no good coming from chatgpt gaining access to my health or personal data.
 
  • Like
Reactions: hajime
Sleep tracking causes more anxiety than good. I stopped using it when I had a good night sleep and it gave me a Regular score just because I went to bed 1 hour later.
I suppose shifting your bedtime may have longer term effects regarding your circadian rhythm, hence the lesser score. And I think your first sentence being a general statement is a bit too broad.
 
You might be mad enough to trust Sam with your health data, but actually believing what it tells you, taking it seriously, and then acting on it feels like an entirely new tier of madness.
 
  • Like
Reactions: SFjohn and jjm3
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.