StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

Shehel Yoosuf*, Temoor Ali, Ahmed Lekssays, Mashael AlSabah, Issa Khalil

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Safety alignment and adversarial attack research for Large Language Models (LLMs) predominantly focuses on natural language inputs and outputs. This work introduces StructTransform, a blackbox attack against alignment where malicious prompts are encoded into diverse structure transformations. These range from standard formats (e.g., SQL, JSON) to novel syntaxes generated entirely by LLMs. By shifting harmful prompts Out-Of-Distribution (OOD) relative to typical natural language, these transformations effectively circumvent existing safety alignment mechanisms. Our extensive evaluations show that simple StructTransform attacks achieve high Attack Success Rates (ASR), nearing 90% even against state-of-the-art models like Claude 3.5 Sonnet. Combining structural and content transformations further increases ASR to over 96% without any refusals. We demonstrate the ease with which LLMs can generate novel syntaxes and their effectiveness in bypassing defenses, creating a vast attack surface. Using a new benchmark, we show that current alignment techniques and defences largely fail against these structure-based attacks. This failure strongly suggests a reliance on token-level patterns within natural language, rather than a robust, structure-aware conceptual understanding of harmful requests, exposing a critical need for generalized safety mechanisms robust to variations in input structure.

Original languageEnglish
Title of host publicationComputer Security – ESORICS 2025 - 30th European Symposium on Research in Computer Security, Proceedings
EditorsVincent Nicomette, Abdelmalek Benzekri, Nora Boulahia-Cuppens, Jaideep Vaidya
PublisherSpringer Science and Business Media Deutschland GmbH
Pages488-507
Number of pages20
ISBN (Print)9783032078834
DOIs
Publication statusPublished - 2026
Event30th European Symposium on Research in Computer Security, ESORICS 2025 - Toulouse, France
Duration: 22 Sept 202524 Sept 2025

Publication series

NameLecture Notes in Computer Science
Volume16053 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference30th European Symposium on Research in Computer Security, ESORICS 2025
Country/TerritoryFrance
CityToulouse
Period22/09/2524/09/25

Keywords

  • Adversarial Prompts
  • LLM Security
  • Large Language Model

Fingerprint

Dive into the research topics of 'StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models'. Together they form a unique fingerprint.

Cite this