Nature Medicine’s definitive Oxford Internet Institute study finds AI medical advice falls short—no better than search in a 1,298-person UK test

February 10, 2026

121

LONDON — Researchers at the University of Oxford said Tuesday that AI medical advice from popular chatbots did not help people make better health decisions than a standard web search in a large U.K. user test. They said the shortfall reflects a two-way communication breakdown: users can leave out critical details, and chatbots may blend accurate guidance with misleading suggestions, Feb. 10, 2026.

In the randomized trial, published in Nature Medicine, 1,298 adults in Britain worked through one of 10 doctor-written scenarios and chose a next step on a five-point scale ranging from self-care to calling an ambulance. The trial ran online between Aug. 21 and Oct. 14, 2024, and participants were assigned to consult OpenAI’s GPT-4o, Meta’s Llama 3 or Cohere’s Command R+ (which can search the open internet), or to use their usual sources — most often a search engine or the NHS website.

Tested “alone” on the scenarios, the models identified at least one relevant condition in about 95% of responses and selected the best level of care about 56% of the time. But with real users in the loop, participants using AI medical advice identified relevant conditions in fewer than 34.5% of responses and chose the correct level of care in fewer than 44.2% — statistically no better than the control group, Reuters reported.

What the Oxford trial found about AI medical advice

The paper argues that the benchmarks that make chatbots look clinically strong — exam-style questions and scripted tests — can mask how tools behave when an ordinary person is describing symptoms, asking follow-ups and trying to interpret the answer.

Oxford researchers found the control group was more likely to name relevant conditions, including “red flag” possibilities linked to serious illness, than participants who used the chatbots. The authors said that matters because the bots’ responses often combined good and bad recommendations, leaving users to guess what to trust.

In a University of Oxford press release, the Oxford Internet Institute said small changes in wording could produce materially different answers. “Patients need to be aware that asking a large language model about their symptoms can be dangerous,” said Dr. Rebecca Payne, a general practitioner who served as the study’s lead medical practitioner, in the Oxford Internet Institute release.

The authors urged developers and regulators to push beyond lab-style scores and require rigorous testing with diverse users before public-facing AI medical advice tools are treated as safe shortcuts for triage.

Continuity: what earlier research said about AI medical advice

The Oxford findings extend a long-running debate over automated symptom guidance. A 2015 BMJ audit of symptom checkers reported deficits in both diagnosis and triage, years before today’s generative chatbots hit the mainstream. At the same time, the appeal of conversational answers is clear: a 2023 JAMA Internal Medicine analysis found clinicians rated chatbot responses to patient questions as higher quality and more empathetic than physician responses — a reminder that the tools can feel reassuring even when accuracy is not guaranteed.

Usage is rising as evidence catches up. A 2024 KFF poll on AI and health information found about 17% of U.S. adults said they use AI chatbots at least monthly for health information and advice.

For now, the new research offers a cautious bottom line: AI medical advice is not a reliable substitute for professional care, and “no better than search” can still be dangerously wrong when symptoms are urgent.

Nature Medicine’s definitive Oxford Internet Institute study finds AI medical advice falls short—no better than search in a 1,298-person UK test

What the Oxford trial found about AI medical advice

Continuity: what earlier research said about AI medical advice

Taunsa HIV Outbreak: BBC Probe Alleges Grim Syringe-Reuse Crisis After 331 Children Test Positive

Taunsa HIV outbreak exposes deadly infection-control crisis as 331 children test positive

Deadly Gaza Medical Evacuations Halt: WHO Suspends Critical Rafah Patient Transfers After Contractor Is Killed in Israeli Fire

LEAVE A REPLY Cancel reply

Most Popular

Magnum Spinoff Set for Dec. 8, 2025 in Amsterdam: Bold Ice‑Cream Pure Play Defies GLP‑1 Headwinds With €350–€380m Revamp

CME outage triggers major trading halt after data‑center cooling failure; Globex restarts at 7:30 a.m. CT

AI Boom Unleashes Severe Memory Chip Shortage, Fueling Global Supply Chain Crisis

Copper price surges to record‑breaking $11,210.50 a ton on LME as weaker dollar, Chile output drop and China smelter cuts tighten supply

Contact us
info@orbitbrief.com

FOLLOW US

Nature Medicine’s definitive Oxford Internet Institute study finds AI medical advice falls short—no better than search in a 1,298-person UK test

What the Oxford trial found about AI medical advice

Continuity: what earlier research said about AI medical advice

Taunsa HIV Outbreak: BBC Probe Alleges Grim Syringe-Reuse Crisis After 331 Children Test Positive

Taunsa HIV outbreak exposes deadly infection-control crisis as 331 children test positive

Deadly Gaza Medical Evacuations Halt: WHO Suspends Critical Rafah Patient Transfers After Contractor Is Killed in Israeli Fire

LEAVE A REPLY Cancel reply

Most Popular

Magnum Spinoff Set for Dec. 8, 2025 in Amsterdam: Bold Ice‑Cream Pure Play Defies GLP‑1 Headwinds With €350–€380m Revamp

CME outage triggers major trading halt after data‑center cooling failure; Globex restarts at 7:30 a.m. CT

AI Boom Unleashes Severe Memory Chip Shortage, Fueling Global Supply Chain Crisis

Copper price surges to record‑breaking $11,210.50 a ton on LME as weaker dollar, Chile output drop and China smelter cuts tighten supply

Contact usinfo@orbitbrief.com

FOLLOW US

Contact us
info@orbitbrief.com