Poster Session 3
Keren Khromchenko, MD (she/her/hers)
Resident
Hackensack Meridian Health - Jersey Shore University Medical Center
Neptune, New Jersey, United States
Joseph Canterino, MD
Physician
Hackensack Meridian Health - Jersey Shore University Medical Center
Neptune, New Jersey, United States
Jonathan D. Baum, MD
Physician
Hackensack Meridian Health - Jersey Shore University Medical Center
Neptune, New Jersey, United States
Karen Koscica, DO
Physician
Hackensack Meridian Health - Jersey Shore University Medical Center
Neptune, New Jersey, United States
Sheveta Jain, MD
Physician
Hackensack Meridian Health - Jersey Shore University Medical Center
Neptune, New Jersey, United States
Jennifer E. Powel, MD, MS
Hackensack Meridian Jersey Shore University Medical Center
Neptune, New Jersey, United States
Valeria Distefano, MD
Physician
Hackensack Meridian Health - Jersey Shore University Medical Center
Neptune, New Jersey, United States
Maria Martins, MD
Physician
Hackensack Meridian Health - Jersey Shore University Medical Center
Neptune, New Jersey, United States
Jonathan Faro, MD
Physician / Assistant Professor
Jersey Shore University Medical Center / Hackensack Meridian School of Medicine
Marlboro, New Jersey, United States
Chatbots offer a user-friendly source of information. Recent studies have raised concerns about their accuracy. The World Health Organization (WHO) has warned that untested AI systems could harm patients and erode trust. The detection of soft markers for aneuploidy on ultrasound can cause significant anxiety. The accuracy and completeness of chatbot responses regarding soft markers are unknown. We sought out to evaluate chatbot performance on answering questions regarding soft markers for aneuploidy and whether training improves performance.
Study Design: A qualitative analysis was performed. ChatGPT Version 4o and Google Gemini 1.5 pro were queried on 8 isolated soft markers both prior to and after training using Society for Maternal-Fetal Medicine (SMFM) Consult Series #57: evaluation and management of isolated soft ultrasound markers for aneuploidy in the second trimester. Queries were conducted in July of 2024. Query responses were graded as “acceptable” or “not acceptable” based on accuracy and completeness. Grading of responses were performed by the MFM co-authors individually and then as a group to formulate a consensus.
Results: Pre-training, 37.5% of ChatGPT responses and 50% of Gemini responses were graded as ‘not acceptable.’ For ChatGPT, all ‘not acceptable’ responses (3/3) were incomplete, with none incorrect. For Gemini, all ‘not acceptable’ responses (4/4) were incomplete, with one response also being incorrect. Post-training, 25% of ChatGPT responses and 37.5% of Gemini responses were graded as ‘not acceptable.’ For ChatGPT, all ‘not acceptable’ responses (2/2) were incomplete, with none incorrect. For Gemini, all ‘not acceptable’ responses (3/3) were incomplete, with none incorrect.
Conclusion: While generative AI chatbots like ChatGPT and Google Gemini show potential as supplementary information sources on soft markers for aneuploidy, their responses often lack completeness, which limits their effectiveness. Training with specific medical publications improves performance, but these chatbots should not replace physician counseling.