TY - GEN
T1 - DR-VQA
T2 - 2025 IEEE Region 10 Conference, TENCON 2025
AU - Musleh, Saleh
AU - Al-Absi, Hamada R.H.
AU - Parvez, Md Rizwan
AU - Pai, Anant
AU - Mohamedsalih, Ghassan
AU - Alam, Tanvir
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Medical Visual Question Answering (VQA) presents challenges due to the complexity of imaging data and the need for precise, context-aware responses. Traditional VQA models often struggle in clinical settings, limiting their utility in decision-making. This study proposes DR-VQA, a fine-tuned BLIP (Bootstrapped Language-Image Pretraining) model for medical VQA specifically designed for diabetic retinopathy (DR) based on fundus imaging, leveraging both visual and textual data to generate accurate diagnostic answers. To enhance semantic relevance, BERT-based similarity evaluation is integrated. Using a diabetic retinopathy dataset, the model achieves a validation BERT similarity (BERTsim) score of 0.94 and a test score of 0.95 on 450 samples, demonstrating strong alignment with expert annotations. These results highlight the model's potential to assist clinicians by improving diagnostic accuracy and efficiency. The proposed approach can streamline medical workflows, reduce clinician workload, and enhance patient outcomes. Future work will focus on expanding datasets and refining the model for broader medical applications. We believe our approach will support to enhance the patient care as well democratization on AI technology for community.
AB - Medical Visual Question Answering (VQA) presents challenges due to the complexity of imaging data and the need for precise, context-aware responses. Traditional VQA models often struggle in clinical settings, limiting their utility in decision-making. This study proposes DR-VQA, a fine-tuned BLIP (Bootstrapped Language-Image Pretraining) model for medical VQA specifically designed for diabetic retinopathy (DR) based on fundus imaging, leveraging both visual and textual data to generate accurate diagnostic answers. To enhance semantic relevance, BERT-based similarity evaluation is integrated. Using a diabetic retinopathy dataset, the model achieves a validation BERT similarity (BERTsim) score of 0.94 and a test score of 0.95 on 450 samples, demonstrating strong alignment with expert annotations. These results highlight the model's potential to assist clinicians by improving diagnostic accuracy and efficiency. The proposed approach can streamline medical workflows, reduce clinician workload, and enhance patient outcomes. Future work will focus on expanding datasets and refining the model for broader medical applications. We believe our approach will support to enhance the patient care as well democratization on AI technology for community.
KW - BLIP
KW - Diabetic Retinopathy
KW - Large Language Model
KW - VQA
UR - https://www.scopus.com/pages/publications/105034124925
U2 - 10.1109/TENCON66050.2025.11375176
DO - 10.1109/TENCON66050.2025.11375176
M3 - Conference contribution
AN - SCOPUS:105034124925
T3 - IEEE Region 10 Annual International Conference, Proceedings/TENCON
SP - 214
EP - 218
BT - IEEE Region 10 Conference 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 October 2025 through 30 October 2025
ER -