TY - JOUR
T1 - Exploring the capabilities of large language models in oral and maxillofacial surgery
AU - Khan, Sulaiman
AU - Mohamed, Shahira Padinharepattel
AU - Biswas, Md Rafiul
AU - Shah, Zubair
N1 - Publisher Copyright:
© The Author(s) 2025
PY - 2025/6/26
Y1 - 2025/6/26
N2 - Oral and Maxillofacial Surgery (OMFS) is a surgical spatiality that serves as a bridge between medicine and dentistry, focusing on the diagnosis and treatment of diseases affecting the mouth, jaw, face, and neck. Large Language Models (LLMs), which first appeared in 2019, are trained in extensive text collections and can process languages with high quality. Although OMFS is a hands-on surgical specialty, LLMs have been increasingly used for patient education, research, and training purposes. This study aimed to explore the capabilities of LLMs in the field of OMFS by investigating the most recent literature. Seven peer-reviewed online repositories including PubMed, Scopus, association for computing machinery (ACM), IEEE, Embase, cumulative index to nursing and allied health literature (CINAHL), and Google Scholar, are selected to download relevant articles. Adhering to the PRISMA-ScR guidelines, we conducted a systematic search across these libraries to select articles that incorporated LLMs into OMFS. The forward and backward reference lists of the included articles were checked to retrieve missing articles. After the final screening process a total of 20 studies are selected for this review process. The selected studies demonstrated the applications of LLMs in OMFS, such as patient education, clinical decision support, and procedural guidance for specific procedures. The study results showed variability in LLM response accuracy and lower accuracy in citation generation, whereas open-ended questions achieved higher accuracy rates. Advanced versions of LLMs, such as ChatGPT4, have shown improved accuracy, and reliability compared with older GPT versions. While some studies reported that LLM responses lacked complete details and exhibited only moderate accuracy. This variability in performance emphasizes the need for the continuous refinement of LLMs and highlights the importance of human oversight in clinical applications. However, there is a need for further refinement, extensive research, and verification by experts.
AB - Oral and Maxillofacial Surgery (OMFS) is a surgical spatiality that serves as a bridge between medicine and dentistry, focusing on the diagnosis and treatment of diseases affecting the mouth, jaw, face, and neck. Large Language Models (LLMs), which first appeared in 2019, are trained in extensive text collections and can process languages with high quality. Although OMFS is a hands-on surgical specialty, LLMs have been increasingly used for patient education, research, and training purposes. This study aimed to explore the capabilities of LLMs in the field of OMFS by investigating the most recent literature. Seven peer-reviewed online repositories including PubMed, Scopus, association for computing machinery (ACM), IEEE, Embase, cumulative index to nursing and allied health literature (CINAHL), and Google Scholar, are selected to download relevant articles. Adhering to the PRISMA-ScR guidelines, we conducted a systematic search across these libraries to select articles that incorporated LLMs into OMFS. The forward and backward reference lists of the included articles were checked to retrieve missing articles. After the final screening process a total of 20 studies are selected for this review process. The selected studies demonstrated the applications of LLMs in OMFS, such as patient education, clinical decision support, and procedural guidance for specific procedures. The study results showed variability in LLM response accuracy and lower accuracy in citation generation, whereas open-ended questions achieved higher accuracy rates. Advanced versions of LLMs, such as ChatGPT4, have shown improved accuracy, and reliability compared with older GPT versions. While some studies reported that LLM responses lacked complete details and exhibited only moderate accuracy. This variability in performance emphasizes the need for the continuous refinement of LLMs and highlights the importance of human oversight in clinical applications. However, there is a need for further refinement, extensive research, and verification by experts.
KW - And neck surgery
KW - Bard
KW - ChatGPT
KW - Head
KW - Large language model
KW - Llm
KW - Maxillofacial surgery
KW - Oral surgery
UR - https://www.scopus.com/pages/publications/105010346844
U2 - 10.1177/00202940251344491
DO - 10.1177/00202940251344491
M3 - Article
AN - SCOPUS:105010346844
SN - 0020-2940
JO - Measurement and Control (United Kingdom)
JF - Measurement and Control (United Kingdom)
M1 - 00202940251344491
ER -