Dynamic Routing between Multimodal Capsules for Deepfake Image Editing Detection

  • Tuan Nguyen*
  • , Naseem Khan
  • , Issa Khalil
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

State-of-the-art methods have demonstrated exceptional performance in detecting deepfakes, particularly when the entire image content is generated by text-to-image generation models. However, significant challenges remain in the cases of instructional image editing, where AI models conditionally generate content based on both a real image and an edit prompt. In such scenarios, the generated content closely resembles the original image, making it more difficult for detection systems to identify deepfakes. In this paper, we propose a novel multimodal capsule network designed to address the detection of deepfake image editing. Specifically, low-level capsules from multiple modalities are integrated to predict capsules in subsequent layers. High-level capsules compete to select relevant low-level capsules, effectively aggregating local features to detect manipulated entities. The proposed approach is evaluated on diverse datasets, including natural images from real-world scenarios. Experimental results demonstrate that our model significantly outperforms state-of-the-art methods, achieving a substantial 10% improvement in Three-turn Edits or 20% improvement in Open Images Edits. Ablation studies further validate the robustness of the network, achieving detection rates exceeding 94% against natural perturbations and over 91% recall against white-box and black-box attacks. Additionally, the model effectively adapts to unseen image editing datasets, highlighting its ability to generalize across diverse and previously unencountered editing scenarios.

Original languageEnglish
Title of host publicationProceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
EditorsTat-Seng Chua, Lai-Kuan Wong, Chee Seng Chan, Jinhui Tang, Chong-Wah Ngo, Klaus Schoeffmann, Jiaying Liu, Yo-Sung Ho
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400720055
DOIs
Publication statusPublished - 6 Dec 2025
Event7th ACM International Conference on Multimedia in Asia, MMAsia 2025 - Kuala Lumpur, Malaysia
Duration: 9 Dec 202512 Dec 2025

Publication series

NameProceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025

Conference

Conference7th ACM International Conference on Multimedia in Asia, MMAsia 2025
Country/TerritoryMalaysia
CityKuala Lumpur
Period9/12/2512/12/25

Keywords

  • Capsule Network
  • Deepfake Image Editing
  • Multimodal Learning

Fingerprint

Dive into the research topics of 'Dynamic Routing between Multimodal Capsules for Deepfake Image Editing Detection'. Together they form a unique fingerprint.

Cite this