In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors
A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.
The researchers said they conducted a variety of experiments to measure how OpenAI’s models compared to human physicians.
Instead, it said the findings show an “urgent need for prospective trials to evaluate these technologies in real-world patient care settings.
” Adam Rodman, a Beth Israel doctor who’s also one of the study’s lead authors, warned the Guardian that there’s “no formal framework right now for accountability” around AI diagnoses, and that patients still “want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions.
“I would not be surprised if a LLM could beat a dermatologist at an neurosurgery board exam, [but] that’s not a particularly helpful thing to know. ” She also argued, “As an ER doctor seeing a patient for a first time, my primary goal is not to guess your ultimate diagnosis.
My primary goal is to determine if you have a condition that could kill you.
” This post and headline have been updated to reflect the fact that the diagnoses in the study came from internal medicine attending physicians, and to include commentary from Kristen Panthagani
Logic Quality Breakdown:
- Updated_At:
- Truth_Blocks:
- Analysis_Method: