Skip to main navigation Skip to search Skip to main content

DR-VQA: Large Language Model Based Vision Question Answering System on Diabetic Retinopathy

  • Hamad bin Khalifa University
  • Hamad Medical Corporation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Medical Visual Question Answering (VQA) presents challenges due to the complexity of imaging data and the need for precise, context-aware responses. Traditional VQA models often struggle in clinical settings, limiting their utility in decision-making. This study proposes DR-VQA, a fine-tuned BLIP (Bootstrapped Language-Image Pretraining) model for medical VQA specifically designed for diabetic retinopathy (DR) based on fundus imaging, leveraging both visual and textual data to generate accurate diagnostic answers. To enhance semantic relevance, BERT-based similarity evaluation is integrated. Using a diabetic retinopathy dataset, the model achieves a validation BERT similarity (BERTsim) score of 0.94 and a test score of 0.95 on 450 samples, demonstrating strong alignment with expert annotations. These results highlight the model's potential to assist clinicians by improving diagnostic accuracy and efficiency. The proposed approach can streamline medical workflows, reduce clinician workload, and enhance patient outcomes. Future work will focus on expanding datasets and refining the model for broader medical applications. We believe our approach will support to enhance the patient care as well democratization on AI technology for community.

Original languageEnglish
Title of host publicationIEEE Region 10 Conference 2025
Subtitle of host publicationUnleashing Innovation: Elevating Technologies to New Horizon, TENCON 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages214-218
Number of pages5
ISBN (Electronic)9798331537722
DOIs
Publication statusPublished - 2025
Event2025 IEEE Region 10 Conference, TENCON 2025 - Kota Kinabalu, Malaysia
Duration: 27 Oct 202530 Oct 2025

Publication series

NameIEEE Region 10 Annual International Conference, Proceedings/TENCON
ISSN (Print)2159-3442
ISSN (Electronic)2159-3450

Conference

Conference2025 IEEE Region 10 Conference, TENCON 2025
Country/TerritoryMalaysia
CityKota Kinabalu
Period27/10/2530/10/25

Keywords

  • BLIP
  • Diabetic Retinopathy
  • Large Language Model
  • VQA

Fingerprint

Dive into the research topics of 'DR-VQA: Large Language Model Based Vision Question Answering System on Diabetic Retinopathy'. Together they form a unique fingerprint.

Cite this