This site is part of the Siconnects Division of Sciinov Group
This site is operated by a business or businesses owned by Sciinov Group and all copyright resides with them.
ADD THESE DATES TO YOUR E-DIARY OR GOOGLE CALENDAR
May 29, 2025
Medical professionals have been using artificial intelligence (AI) to streamline diagnoses for decades, using what are called diagnostic decision support systems (DDSSs). Computer scientists at Massachusetts General Hospital (MGH), a founding member of the Mass General Brigham healthcare system first developed MGH’s own DDSS, called DXplain, in 1984, which relies on thousands of disease profiles, clinical findings, and data points to generate and rank potential diagnoses for use by clinicians.
With the popularization and increased accessibility of generative AI and large language models (LLMs) in medicine, investigators at MGH’s Laboratory of Computer Science (LCS) sought to compare the diagnostic capabilities of DXplain, which has evolved over the past four decades, to popular LLMs. Their new research compares ChatGPT, Gemini, and DXplain at diagnosing patient cases, revealing that DXplain performed somewhat better, but the LLMs also performed well. The investigators envision pairing DXplain with an LLM as the optimal way forward, as it would improve both systems and enhance their clinical efficacy.
These systems can enhance and expand clinicians’ diagnoses, recalling information that physicians may forget in the heat of the moment and isn’t biased by common flaws in human reasoning. And now, we think combining the powerful explanatory capabilities of existing diagnostic systems with the linguistic capabilities of large language models will enable better automated diagnostic decision support and patient outcomes.
The investigators tested the diagnostic capabilities of DXplain, ChatGPT, and Gemini using 36 patient cases spanning racial, ethnic, age, and gender categories. For each case, the systems had a chance to suggest potential case diagnoses both with and without lab data. With lab data, all three systems listed the correct diagnosis most of the time: 72% for DXplain, 64% for ChatGPT, and 58% for Gemini. Without lab data, DXplain listed the correct diagnosis 56% of the time, outperforming ChatGPT (42%) and Gemini (39%), though the results were not statistically significant.