LONDON — Researchers at the University of Oxford said Tuesday that AI medical advice from popular chatbots did not help people make better health decisions than a standard web search in a large U.K. user test. They said the shortfall reflects a two-way communication breakdown: users can leave out critical details, and chatbots may blend accurate guidance with misleading suggestions, Feb. 10, 2026.
In the randomized trial, published in Nature Medicine, 1,298 adults in Britain worked through one of 10 doctor-written scenarios and chose a next step on a five-point scale ranging from self-care to calling an ambulance. The trial ran online between Aug. 21 and Oct. 14, 2024, and participants were assigned to consult OpenAI’s GPT-4o, Meta’s Llama 3 or Cohere’s Command R+ (which can search the open internet), or to use their usual sources — most often a search engine or the NHS website.
Tested “alone” on the scenarios, the models identified at least one relevant condition in about 95% of responses and selected the best level of care about 56% of the time. But with real users in the loop, participants using AI medical advice identified relevant conditions in fewer than 34.5% of responses and chose the correct level of care in fewer than 44.2% — statistically no better than the control group, Reuters reported.
What the Oxford trial found about AI medical advice
The paper argues that the benchmarks that make chatbots look clinically strong — exam-style questions and scripted tests — can mask how tools behave when an ordinary person is describing symptoms, asking follow-ups and trying to interpret the answer.
Oxford researchers found the control group was more likely to name relevant conditions, including “red flag” possibilities linked to serious illness, than participants who used the chatbots. The authors said that matters because the bots’ responses often combined good and bad recommendations, leaving users to guess what to trust.
In a University of Oxford press release, the Oxford Internet Institute said small changes in wording could produce materially different answers. “Patients need to be aware that asking a large language model about their symptoms can be dangerous,” said Dr. Rebecca Payne, a general practitioner who served as the study’s lead medical practitioner, in the Oxford Internet Institute release.
The authors urged developers and regulators to push beyond lab-style scores and require rigorous testing with diverse users before public-facing AI medical advice tools are treated as safe shortcuts for triage.
Continuity: what earlier research said about AI medical advice
The Oxford findings extend a long-running debate over automated symptom guidance. A 2015 BMJ audit of symptom checkers reported deficits in both diagnosis and triage, years before today’s generative chatbots hit the mainstream. At the same time, the appeal of conversational answers is clear: a 2023 JAMA Internal Medicine analysis found clinicians rated chatbot responses to patient questions as higher quality and more empathetic than physician responses — a reminder that the tools can feel reassuring even when accuracy is not guaranteed.
Usage is rising as evidence catches up. A 2024 KFF poll on AI and health information found about 17% of U.S. adults said they use AI chatbots at least monthly for health information and advice.
For now, the new research offers a cautious bottom line: AI medical advice is not a reliable substitute for professional care, and “no better than search” can still be dangerously wrong when symptoms are urgent.

