TY - JOUR
T1 - AI-Based Multiclass Grading of Hepatic Steatosis From B-Mode Ultrasound
T2 - Generalization Across Modalities and Clinical Comparison With Radiologists
AU - Alshagathrh, Fahad Muflih
AU - Zubaydi, Haider Dhia
AU - Alzubaidi, Mahmood
AU - Alosaimi, Abdulaziz
AU - Al Saqer, Raneem Mohammed
AU - Alzahrani, Abdullah Mutlaq
AU - Alfaqiri, Mei Khalid
AU - Elzahrani, Mohamed Rajab
AU - Alswat, Khalid
AU - Aldhebaib, Ali
AU - Alahmadi, Bushra
AU - Alkubeyyer, Meteb
AU - Alsadoon, Amani
AU - Alkhamash, Maram
AU - Alraimi, Jawad Ahmad
AU - Schneider, Jens
AU - Househ, Mowafa
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Non-alcoholic fatty liver disease (NAFLD) is a growing public health challenge, underscoring the need for scalable, non-invasive tools to grade hepatic steatosis. Although B-mode ultrasound is accessible and safe, its reliability is limited by operator and scanner variability. We present the Deep Domain Adaptation Neural Network (DDANN), a deep learning system for multiclass steatosis classification (Normal, Mild, Moderate, Severe) from ultrasound that emphasizes cross-device generalizability. To mitigate distribution shifts across scanners (LOGIQ, iU22, EPIQ), DDANN combines a MobileNetV2 backbone with triplet loss, entropy-based domain adaptation, and preprocessing that includes speckle suppression, percentile normalization, and LOGIQ-specific harmonization. Trained on a biopsy-confirmed, multi-institutional cohort (primarily LOGIQ and iU22), the model was externally validated on an unseen EPIQ test set of 1,083 images from 47 patients, achieving 98.71% accuracy, 0.9872 macro F-1-score, and 0.9998 AUC-ROC, outperforming baselines. In a separate radiologist-AI comparison on 224 biopsy-confirmed images not used for training or validation, the AI reached 91.96% accuracy, significantly exceeding radiologists' 19.64%-31.70% (McNemar's test, p < 0.001), with strong agreement to ground truth (kappa = 0.893) versus radiologists' poor-to-slight agreement (kappa = 0.006-0.194). The AI maintained balanced class-wise F-1-scores (0.90-0.94), while radiologists struggled, particularly with Mild and Moderate cases, and exhibited substantial inter-reader variability (kappa = 0.068-0.648). These results demonstrate robust cross-device performance and support integrating AI as a reliable second reader or primary screening tool to reduce subjectivity in steatosis assessment.
AB - Non-alcoholic fatty liver disease (NAFLD) is a growing public health challenge, underscoring the need for scalable, non-invasive tools to grade hepatic steatosis. Although B-mode ultrasound is accessible and safe, its reliability is limited by operator and scanner variability. We present the Deep Domain Adaptation Neural Network (DDANN), a deep learning system for multiclass steatosis classification (Normal, Mild, Moderate, Severe) from ultrasound that emphasizes cross-device generalizability. To mitigate distribution shifts across scanners (LOGIQ, iU22, EPIQ), DDANN combines a MobileNetV2 backbone with triplet loss, entropy-based domain adaptation, and preprocessing that includes speckle suppression, percentile normalization, and LOGIQ-specific harmonization. Trained on a biopsy-confirmed, multi-institutional cohort (primarily LOGIQ and iU22), the model was externally validated on an unseen EPIQ test set of 1,083 images from 47 patients, achieving 98.71% accuracy, 0.9872 macro F-1-score, and 0.9998 AUC-ROC, outperforming baselines. In a separate radiologist-AI comparison on 224 biopsy-confirmed images not used for training or validation, the AI reached 91.96% accuracy, significantly exceeding radiologists' 19.64%-31.70% (McNemar's test, p < 0.001), with strong agreement to ground truth (kappa = 0.893) versus radiologists' poor-to-slight agreement (kappa = 0.006-0.194). The AI maintained balanced class-wise F-1-scores (0.90-0.94), while radiologists struggled, particularly with Mild and Moderate cases, and exhibited substantial inter-reader variability (kappa = 0.068-0.648). These results demonstrate robust cross-device performance and support integrating AI as a reliable second reader or primary screening tool to reduce subjectivity in steatosis assessment.
KW - Accuracy
KW - Adaptation models
KW - Artificial intelligence
KW - Benchmark testing
KW - Biomedical imaging
KW - Biopsy ground truth
KW - Deep learning
KW - Diagnostic accuracy
KW - Domain adaptation
KW - Hepatic steatosis
KW - Inter-rater reliability
KW - Liver diseases
KW - Multiclass classification
KW - Training
KW - Ultrasonic imaging
KW - Ultrasound imaging
KW - Urban areas
KW - non-alcoholic fatty liver disease (NAFLD)
UR - https://www.scopus.com/pages/publications/105018037300
U2 - 10.1109/ACCESS.2025.3617778
DO - 10.1109/ACCESS.2025.3617778
M3 - Article
AN - SCOPUS:105018037300
SN - 2169-3536
VL - 13
SP - 178725
EP - 178757
JO - IEEE Access
JF - IEEE Access
ER -