Abstract
Brain tumors are among the most aggressive and life-threatening cancers, requiring accurate and timely diagnosis for effective treatment. Convolutional neural networks (CNNs) and vision transformers (ViTs) have been widely explored for MRI-based tumor classification. However, CNNs often struggle with long-range dependency modeling and background noise, while ViTs lack strong inductive biases such as translation invariance. Although CNN-ViT hybridization can improve performance, practical screening systems on consumer edge devices must satisfy strict constraints on compute, memory, and latency. To address these challenges, we propose ViT-CNN, an explainable dual-stream cross-attention model that integrates CNN and ViT representations in an efficiency-aware manner for brain tumor screening on consumer edge devices. The CNN branch captures tumor-specific local patterns such as edges, textures, and shapes, while the ViT branch models global anatomical context. A spatial attention module is applied to the CNN features to suppress background noise, and a self-attention refinement module is applied to the ViT representation to enhance informative global cues. The two streams are then fused through bidirectional cross-attention, enabling adaptive interaction between local and global features. Experiments on two public MRI datasets show that ViT-CNN achieves 93.86±0.40% accuracy on the Kaggle multiclass dataset-v2 and 99.33% on the BR35H binary dataset. To strengthen interpretability, we perform both qualitative and quantitative XAI analysis using saliency maps and LIME, including fidelity and consistency evaluation. Deployment-oriented profiling further demonstrates real-time CPU inference at 29.36 ms/image (34.05 FPS) for the full model, while lightweight variants achieve up to 55.57 FPS on CPU with modest accuracy reduction, supporting flexible deployment across consumer edge devices.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Consumer Electronics |
| DOIs | |
| Publication status | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- Brain Tumor Classification
- Cross-attention fusion
- Feature Fusion
- LIME
- MRI
- Saliency Maps
- ViT-tiny
Fingerprint
Dive into the research topics of 'ViT-CNN: Explainable Dual-Stream Cross-Attention for MRI Brain Tumor Screening on Consumer Edge Devices'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver