TY - GEN
T1 - StructTransform
T2 - 30th European Symposium on Research in Computer Security, ESORICS 2025
AU - Yoosuf, Shehel
AU - Ali, Temoor
AU - Lekssays, Ahmed
AU - AlSabah, Mashael
AU - Khalil, Issa
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Safety alignment and adversarial attack research for Large Language Models (LLMs) predominantly focuses on natural language inputs and outputs. This work introduces StructTransform, a blackbox attack against alignment where malicious prompts are encoded into diverse structure transformations. These range from standard formats (e.g., SQL, JSON) to novel syntaxes generated entirely by LLMs. By shifting harmful prompts Out-Of-Distribution (OOD) relative to typical natural language, these transformations effectively circumvent existing safety alignment mechanisms. Our extensive evaluations show that simple StructTransform attacks achieve high Attack Success Rates (ASR), nearing 90% even against state-of-the-art models like Claude 3.5 Sonnet. Combining structural and content transformations further increases ASR to over 96% without any refusals. We demonstrate the ease with which LLMs can generate novel syntaxes and their effectiveness in bypassing defenses, creating a vast attack surface. Using a new benchmark, we show that current alignment techniques and defences largely fail against these structure-based attacks. This failure strongly suggests a reliance on token-level patterns within natural language, rather than a robust, structure-aware conceptual understanding of harmful requests, exposing a critical need for generalized safety mechanisms robust to variations in input structure.
AB - Safety alignment and adversarial attack research for Large Language Models (LLMs) predominantly focuses on natural language inputs and outputs. This work introduces StructTransform, a blackbox attack against alignment where malicious prompts are encoded into diverse structure transformations. These range from standard formats (e.g., SQL, JSON) to novel syntaxes generated entirely by LLMs. By shifting harmful prompts Out-Of-Distribution (OOD) relative to typical natural language, these transformations effectively circumvent existing safety alignment mechanisms. Our extensive evaluations show that simple StructTransform attacks achieve high Attack Success Rates (ASR), nearing 90% even against state-of-the-art models like Claude 3.5 Sonnet. Combining structural and content transformations further increases ASR to over 96% without any refusals. We demonstrate the ease with which LLMs can generate novel syntaxes and their effectiveness in bypassing defenses, creating a vast attack surface. Using a new benchmark, we show that current alignment techniques and defences largely fail against these structure-based attacks. This failure strongly suggests a reliance on token-level patterns within natural language, rather than a robust, structure-aware conceptual understanding of harmful requests, exposing a critical need for generalized safety mechanisms robust to variations in input structure.
KW - Adversarial Prompts
KW - LLM Security
KW - Large Language Model
UR - https://www.scopus.com/pages/publications/105020264008
U2 - 10.1007/978-3-032-07884-1_25
DO - 10.1007/978-3-032-07884-1_25
M3 - Conference contribution
AN - SCOPUS:105020264008
SN - 9783032078834
T3 - Lecture Notes in Computer Science
SP - 488
EP - 507
BT - Computer Security – ESORICS 2025 - 30th European Symposium on Research in Computer Security, Proceedings
A2 - Nicomette, Vincent
A2 - Benzekri, Abdelmalek
A2 - Boulahia-Cuppens, Nora
A2 - Vaidya, Jaideep
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 22 September 2025 through 24 September 2025
ER -