Evaluating the limits of AI in medical specialization: ChatGPT's performance on the UK neurology specialty certificate examination
File(s)e000451.full.pdf (790.38 KB)
Published version
Author(s)
Giannos, Panagiotis
Type
Journal Article
Abstract
Background: Large language models like ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience.
Methods: We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool - SCE Neurology Web Questions bank. The dataset primarily focused on Neurology (80%). The questions spanned sub-topics like Symptoms and Signs, Diagnosis, Interpretation and Management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default, and ChatGPT-4 models was evaluated and compared.
Results: ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE Neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and sub-topics.
Conclusions: The advancements in ChatGPT-4's performance compared to its predecessors demonstrate the potential for AI models in specialized medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models' relevance and reliability in the rapidly evolving field of medicine.
Methods: We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool - SCE Neurology Web Questions bank. The dataset primarily focused on Neurology (80%). The questions spanned sub-topics like Symptoms and Signs, Diagnosis, Interpretation and Management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default, and ChatGPT-4 models was evaluated and compared.
Results: ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE Neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and sub-topics.
Conclusions: The advancements in ChatGPT-4's performance compared to its predecessors demonstrate the potential for AI models in specialized medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models' relevance and reliability in the rapidly evolving field of medicine.
Date Issued
2023-06-15
Date Acceptance
2023-06-06
Citation
BMJ Neurology Open, 2023, 5 (1)
ISSN
2632-6140
Publisher
BMJ Publishing Group
Journal / Book Title
BMJ Neurology Open
Volume
5
Issue
1
Copyright Statement
© Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
License URL
Publication Status
Published
Article Number
ARTN e000451
Date Publish Online
2023-06-15