TY - GEN
T1 - Dynamic Routing between Multimodal Capsules for Deepfake Image Editing Detection
AU - Nguyen, Tuan
AU - Khan, Naseem
AU - Khalil, Issa
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/12/6
Y1 - 2025/12/6
N2 - State-of-the-art methods have demonstrated exceptional performance in detecting deepfakes, particularly when the entire image content is generated by text-to-image generation models. However, significant challenges remain in the cases of instructional image editing, where AI models conditionally generate content based on both a real image and an edit prompt. In such scenarios, the generated content closely resembles the original image, making it more difficult for detection systems to identify deepfakes. In this paper, we propose a novel multimodal capsule network designed to address the detection of deepfake image editing. Specifically, low-level capsules from multiple modalities are integrated to predict capsules in subsequent layers. High-level capsules compete to select relevant low-level capsules, effectively aggregating local features to detect manipulated entities. The proposed approach is evaluated on diverse datasets, including natural images from real-world scenarios. Experimental results demonstrate that our model significantly outperforms state-of-the-art methods, achieving a substantial 10% improvement in Three-turn Edits or 20% improvement in Open Images Edits. Ablation studies further validate the robustness of the network, achieving detection rates exceeding 94% against natural perturbations and over 91% recall against white-box and black-box attacks. Additionally, the model effectively adapts to unseen image editing datasets, highlighting its ability to generalize across diverse and previously unencountered editing scenarios.
AB - State-of-the-art methods have demonstrated exceptional performance in detecting deepfakes, particularly when the entire image content is generated by text-to-image generation models. However, significant challenges remain in the cases of instructional image editing, where AI models conditionally generate content based on both a real image and an edit prompt. In such scenarios, the generated content closely resembles the original image, making it more difficult for detection systems to identify deepfakes. In this paper, we propose a novel multimodal capsule network designed to address the detection of deepfake image editing. Specifically, low-level capsules from multiple modalities are integrated to predict capsules in subsequent layers. High-level capsules compete to select relevant low-level capsules, effectively aggregating local features to detect manipulated entities. The proposed approach is evaluated on diverse datasets, including natural images from real-world scenarios. Experimental results demonstrate that our model significantly outperforms state-of-the-art methods, achieving a substantial 10% improvement in Three-turn Edits or 20% improvement in Open Images Edits. Ablation studies further validate the robustness of the network, achieving detection rates exceeding 94% against natural perturbations and over 91% recall against white-box and black-box attacks. Additionally, the model effectively adapts to unseen image editing datasets, highlighting its ability to generalize across diverse and previously unencountered editing scenarios.
KW - Capsule Network
KW - Deepfake Image Editing
KW - Multimodal Learning
UR - https://www.scopus.com/pages/publications/105025153777
U2 - 10.1145/3743093.3771021
DO - 10.1145/3743093.3771021
M3 - Conference contribution
AN - SCOPUS:105025153777
T3 - Proceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
BT - Proceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
A2 - Chua, Tat-Seng
A2 - Wong, Lai-Kuan
A2 - Chan, Chee Seng
A2 - Tang, Jinhui
A2 - Ngo, Chong-Wah
A2 - Schoeffmann, Klaus
A2 - Liu, Jiaying
A2 - Ho, Yo-Sung
PB - Association for Computing Machinery, Inc
T2 - 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
Y2 - 9 December 2025 through 12 December 2025
ER -