AI-Based Multiclass Grading of Hepatic Steatosis From B-Mode Ultrasound: Generalization Across Modalities and Clinical Comparison With Radiologists

  • Fahad Muflih Alshagathrh
  • , Haider Dhia Zubaydi
  • , Mahmood Alzubaidi
  • , Abdulaziz Alosaimi
  • , Raneem Mohammed Al Saqer
  • , Abdullah Mutlaq Alzahrani
  • , Mei Khalid Alfaqiri
  • , Mohamed Rajab Elzahrani
  • , Khalid Alswat
  • , Ali Aldhebaib
  • , Bushra Alahmadi
  • , Meteb Alkubeyyer
  • , Amani Alsadoon
  • , Maram Alkhamash
  • , Jawad Ahmad Alraimi
  • , Jens Schneider
  • , Mowafa Househ*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Non-alcoholic fatty liver disease (NAFLD) is a growing public health challenge, underscoring the need for scalable, non-invasive tools to grade hepatic steatosis. Although B-mode ultrasound is accessible and safe, its reliability is limited by operator and scanner variability. We present the Deep Domain Adaptation Neural Network (DDANN), a deep learning system for multiclass steatosis classification (Normal, Mild, Moderate, Severe) from ultrasound that emphasizes cross-device generalizability. To mitigate distribution shifts across scanners (LOGIQ, iU22, EPIQ), DDANN combines a MobileNetV2 backbone with triplet loss, entropy-based domain adaptation, and preprocessing that includes speckle suppression, percentile normalization, and LOGIQ-specific harmonization. Trained on a biopsy-confirmed, multi-institutional cohort (primarily LOGIQ and iU22), the model was externally validated on an unseen EPIQ test set of 1,083 images from 47 patients, achieving 98.71% accuracy, 0.9872 macro F1-score, and 0.9998 AUC-ROC, outperforming baselines. In a separate radiologist–AI comparison on 224 biopsy-confirmed images not used for training or validation, the AI reached 91.96% accuracy, significantly exceeding radiologists’ 19.64%–31.70% (McNemar’s test, p <0.001 ), with strong agreement to ground truth ( k = 0.893 ) versus radiologists’ poor-to-slight agreement ( k = 0.006 –0.194). The AI maintained balanced class-wise F1 -scores (0.90–0.94), while radiologists struggled, particularly with Mild and Moderate cases, and exhibited substantial inter-reader variability ( k = 0.068 –0.648). These results demonstrate robust cross-device performance and support integrating AI as a reliable second reader or primary screening tool to reduce subjectivity in steatosis assessment.

Original languageEnglish
Pages (from-to)178725-178757
Number of pages33
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • Artificial intelligence
  • biopsy ground truth
  • deep learning
  • diagnostic accuracy
  • domain adaptation
  • hepatic steatosis
  • inter-rater reliability
  • multiclass classification
  • non-alcoholic fatty liver disease (NAFLD)
  • ultrasound imaging

Fingerprint

Dive into the research topics of 'AI-Based Multiclass Grading of Hepatic Steatosis From B-Mode Ultrasound: Generalization Across Modalities and Clinical Comparison With Radiologists'. Together they form a unique fingerprint.

Cite this