PathVLM-Eval: Evaluation of open vision language models in histopathology

Nauman Ullah Gilal, Rachida Zegour, Khaled Al-Thelaya, Erdener Özer, Marco Agus*, Jens Schneider, Sabri Boughorbel

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The emerging trend of vision language models (VLMs) has introduced a new paradigm in artificial intelligence (AI). However, their evaluation has predominantly focused on general-purpose datasets, providing a limited understanding of their effectiveness in specialized domains. Medical imaging, particularly digital pathology, could significantly benefit from VLMs for histological interpretation and diagnosis, enabling pathologists to use a complementary tool for faster morecomprehensive reporting and efficient healthcare service. In this work, we are interested in benchmarking VLMs on histopathology image understanding. We present an extensive evaluation of recent VLMs on the PathMMU dataset, a domain-specific benchmark that includes subsets such as PubMed, SocialPath, and EduContent. These datasets feature diverse formats, notably multiple-choice questions (MCQs), designed to aid pathologists in diagnostic reasoning and support professional development initiatives in histopathology. Utilizing VLMEvalKit, a widely used open-source evaluation framework—we bring publicly available pathology datasets under a single evaluation umbrella, ensuring unbiased and contamination-free assessments of model performance. Our study conducts extensive zero-shot evaluations of more than 60 state-of-the-art VLMs, including LLaVA, Qwen-VL, Qwen2-VL, InternVL, Phi3, Llama3, MOLMO, and XComposer series, significantly expanding the range of evaluated models compared to prior literature. Among the tested models, Qwen2-VL-72B-Instruct achieved superior performance with an average score of 63.97% outperforming other models across all PathMMU subsets. We conclude that this extensive evaluation will serve as a valuable resource, fostering the development of next-generation VLMs for analyzing digital pathology images. Additionally, we have released the complete evaluation results on our leaderboard PathVLM-Eval: https://huggingface.co/spaces/gilalnauman/PathVLMs.

Original languageEnglish
Article number100455
JournalJournal of Pathology Informatics
Volume18
DOIs
Publication statusPublished - 5 Jun 2025

Keywords

  • LLMs benchmarking
  • Pathology
  • VLMs
  • Zero-shot evaluation

Fingerprint

Dive into the research topics of 'PathVLM-Eval: Evaluation of open vision language models in histopathology'. Together they form a unique fingerprint.

Cite this