Skip to main navigation Skip to search Skip to main content

CatSeg: A holistic cataract surgical scene segmentation model

  • Muraam Abdel-Ghani
  • , Tanvir Alam*
  • *Corresponding author for this work
  • Hamad bin Khalifa University

Research output: Contribution to journalArticlepeer-review

Abstract

Cataract surgery is the primary treatment for cataracts and requires precise surgical skills that are critical for successful outcomes. To enhance the training of new surgeons, automated analysis of surgical videos can provide valuable feedback, particularly in identifying instruments and anatomical structures. However, current segmentation methods often fail to accurately identify these components with consistent performance, which limits their effectiveness as training tools. To address this, we propose CatSeg, an advanced surgical scene segmentation model designed to improve the accuracy of both instrument and anatomical segmentation during cataract surgery. CatSeg employs a pyramidal convolution backbone and a Multi-Scale Instrument Feature Attention (MSIFA) module, enhancing edge feature preservation for better instrument identification. Evaluated on the Cataract-1k and CaDIS benchmarks, CatSeg achieved an improved mean IoU of +8.13% (0.8107 → 0.8920) and +1.35% (0.8670 → 0.8805), respectively. With an efficient operation speed of 6.35 to 7.66 frames per second, CatSeg is well-suited for surgical video analysis, paving the way for enhanced training and analysis in the development of new surgeons. Source code is available at: https://github.com/Muraam-Abdel-Ghani/CatSeg/.

Original languageEnglish
Article number110079
JournalBiomedical Signal Processing and Control
Volume120
DOIs
Publication statusPublished - 1 Jul 2026

Keywords

  • Anterior segment structures
  • Cataract surgical segmentation
  • Deep learning
  • Holistic scene segmentation
  • Intraocular lens
  • Multi-scale attention
  • Pyramidal convolution

Fingerprint

Dive into the research topics of 'CatSeg: A holistic cataract surgical scene segmentation model'. Together they form a unique fingerprint.

Cite this