AI Model ChatGPT Achieves Near-Passing Scores on US Medical Licensing Exam

A groundbreaking study published on February 9, 2023 in PLOS Digital Health by Tiffany Kung, Victor Tseng, and colleagues at AnsibleHealth has revealed that ChatGPT, the large language model created by OpenAI, is capable of scoring at or near the passing threshold for the United States Medical Licensing Exam (USMLE). The research found that ChatGPT’s responses demonstrate a high level of coherence, internal consistency, and frequent insightful content, with scores hovering around 60 percent. This is a significant development in the field of artificial intelligence and healthcare, and has far-reaching implications for the future of medical education and practice.

OpenAI’s newest AI model, ChatGPT, is gaining recognition for its capability to accomplish various tasks related to natural language processing. OpenAI’s newest AI model, ChatGPT, is garnering significant interest due to its versatility in handling a variety of natural language processing tasks. Unlike conventional Deep Learning models, ChatGPT is a Large Language Model that uses context to predict the probability of a specific word sequence. This AI model has been trained on an immense amount of text data, enabling it to create original word sequences that resemble human language. The model is powered by GPT3.5 and uses both reinforcement and supervised learning methods to generate its outputs. The recent study on ChatGPT’s performance in the US Medical Licensing Exam demonstrates its potential for applications in the field of healthcare, despite the current limitations of AI models in clinical care. This is due to a paucity of structured, machine-readable data and the shortage of time, resources, and problem-specific training data. However, the ability of general domain models, like ChatGPT, to perform as well as or even outperform domain-specific models, could change the landscape of AI in healthcare in the future.

In this study, the abilities of ChatGPT, the AI model developed by OpenAI, were evaluated by administering a standardized test for medical professionals, the United States Medical Licensing Examination (USMLE). The USMLE is a comprehensive, three-step assessment program covering all topics deemed necessary for physicians, known for its rigor and high stakes. It has been widely used and recognized for its stable scores and psychometric properties over the past decade, making it an ideal testing ground for AI models. The questions on the USMLE are known to be linguistically and conceptually complex, requiring a strong understanding of medical reasoning and management, making it an ideal challenge for ChatGPT’s performance.

In this study, the researchers evaluated the performance of ChatGPT by administering 350 public questions from the June 2022 release of the United States Medical Licensing Examination (USMLE). The results showed that ChatGPT achieved scores ranging from 52.4% to 75.0%, with a passing threshold of 60%. The software also displayed 94.6% concordance in its responses and produced at least one significant insight in 88.9% of its answers, outperforming PubMedGPT, a model trained solely on biomedical literature, which previously scored 50.8% on a dataset of USMLE-style questions. Despite the limitations imposed by the small size of the input, the researchers believe that these results demonstrate the potential of ChatGPT to aid in medical education and potentially even play a role in clinical practice.

However, this study revealed the limitations of AI language models like ChatGPT in the field of medical education. The authors pointed out that the limited size of the input data used for analysis can hinder the scope of the examination and prevent a thorough analysis. This could restrict the capability to categorize results by subject matter or type of proficiency. They also acknowledged that while human evaluation can be time-consuming, it may be prone to mistakes and variability. The authors suggest future studies should adopt advanced tools such as network analysis of words to enhance accuracy and streamline the process. To truly evaluate the usefulness of AI language models in medical education, the authors believe it is necessary to conduct studies in both controlled and real-life learning environments.

The study has shown that AI is rapidly becoming a ubiquitous presence in healthcare, with numerous applications across all medical disciplines. Investigations into the role of AI in medical practice have now entered the stage of randomized controlled trials, and a growing number of studies indicate its potential to enhance various aspects of healthcare such as risk evaluation, data reduction, supporting clinical decisions, increasing operational effectiveness, and improving patient communication.At AnsibleHealth, a virtual chronic pulmonary disease clinic, clinicians are taking advantage of the impressive performance of ChatGPT by incorporating it into their workflows. Using secure and de-identified queries, the clinic’s medical professionals are using ChatGPT to assist with a variety of tasks, such as writing appeal letters to payors, simplifying medical reports to make them more understandable to patients, and brainstorming to solve complex medical cases. It is believed that AI technologies such as ChatGPT have reached a level of maturity that will soon revolutionize the way healthcare is delivered, offering personalized, compassionate, and scalable services to patients.

“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” say the authors. “ChatGPT contributed substantially to the writing of [our] manuscript… We interacted with ChatGPT much like a colleague, asking it to synthesize, simplify, and offer counterpoints to drafts in progress…All of the co-authors valued ChatGPT’s input.”