Explaining the role of Intrinsic Dimensionality in Adversarial Training

Research output: Contribution to conferencePosterpeer-review

Abstract

Adversarial Training (AT) impacts different architectures in distinct ways: vision models gain robustness but face reduced generalization, encoder-based models exhibit limited robustness improvements with minimal generalization loss, and recent work in latent-space adversarial training (LAT) demonstrates that decoder-based models achieve improved robustness by applying AT across multiple layers. We provide the first explanation for these trends by leveraging the manifold conjecture: off-manifold adversarial examples (AEs) enhance robustness, while on-manifold AEs improve generalization. We show that vision and decoder-based models exhibit low intrinsic dimensionality in earlier layers (favoring off-manifold AEs), whereas encoder-based models do so in later layers (favoring on-manifold AEs). Exploiting this property, we introduce SMAAT, which improves the scalability of AT for encoder-based models by perturbing the layer with the lowest intrinsic dimensionality. This reduces the projected gradient descent (PGD) chain length required for AE generation, cutting GPU time by 25–33% while significantly boosting robustness. We validate SMAAT across multiple tasks, including text generation, sentiment classification, safety filtering, and retrieval augmented generation setups, demonstrating superior robustness with comparable generalization to standard training.
Original languageEnglish
Pages1-16
Number of pages16
Publication statusPublished - 17 Jul 2025
EventInternational Conference on Machine Learning 2025 - Vancouver Convention Center, Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Conference

ConferenceInternational Conference on Machine Learning 2025
Country/TerritoryCanada
CityVancouver
Period13/07/2519/07/25

Fingerprint

Dive into the research topics of 'Explaining the role of Intrinsic Dimensionality in Adversarial Training'. Together they form a unique fingerprint.

Cite this