Adaptive Inter-Modality Attention for Enhanced Cross-Domain Deepfake Detection Transferability

  • Naseem Khan*
  • , Nguyen Vu Tuan
  • , Issa Khalil
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Cross-domain generalization remains a critical challenge in deepfake detection, with existing methods exhibiting severe performance degradation across unseen generative architectures. We propose CAMME (Cross-domain Adaptive Multi-Modal Embeddings), which dynamically integrates visual, textual, and frequency-domain features via embedding-level multi-modal self-attention. Treating each modality as a distinct sequence element enables cross-modal interactions that adaptively weight discriminative features based on input characteristics. Unlike static fusion approaches, CAMME learns input-specific contributions, dynamically emphasizing optimal signals across visual semantics, textual consistency, and spectral artifacts. Evaluation across twelve generative architectures demonstrates superior cross-domain performance: 77.34% average F1-score on natural scenes (7.30% improvement) and 66.46% on facial datasets (13.25% improvement). CAMME exhibits exceptional robustness with 14.7% Attack Success Rate against seven prominent adversarial attacks (4-6× improvement) and 96.63% accuracy under natural perturbations. Ablation results confirm the importance of each modality and the effectiveness of our inter-modal attention over standard fusion methods.

Original languageEnglish
Title of host publicationProceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
EditorsTat-Seng Chua, Lai-Kuan Wong, Chee Seng Chan, Jinhui Tang, Chong-Wah Ngo, Klaus Schoeffmann, Jiaying Liu, Yo-Sung Ho
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400720055
DOIs
Publication statusPublished - 6 Dec 2025
Event7th ACM International Conference on Multimedia in Asia, MMAsia 2025 - Kuala Lumpur, Malaysia
Duration: 9 Dec 202512 Dec 2025

Publication series

NameProceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025

Conference

Conference7th ACM International Conference on Multimedia in Asia, MMAsia 2025
Country/TerritoryMalaysia
CityKuala Lumpur
Period9/12/2512/12/25

Keywords

  • Adversarial robustness
  • Cross-domain transferability
  • Deepfake detection
  • Inter-modal attention
  • Multi-modal learning

Fingerprint

Dive into the research topics of 'Adaptive Inter-Modality Attention for Enhanced Cross-Domain Deepfake Detection Transferability'. Together they form a unique fingerprint.

Cite this