TY - GEN
T1 - TECHNIQUERAG
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
AU - Lekssays, Ahmed
AU - Shukla, Utsav
AU - Sencar, Husrev Taha
AU - Parvez, Md Rizwan
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations-such as custom hard-negative mining and denoising-resources rarely available in specialized domains. We propose TECHNIQUERAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. First, our approach mitigates data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing resource-intensive retrieval training. Second, although conventional RAG mitigates hallucination by coupling retrieval and generation, its dependence on generic retrievers often introduces noisy candidates, thereby limiting domain-specific precision. To address, we enhance the retrieval quality and domain specificity through a zero-shot LLM re-ranking that explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TECHNIQUERAG achieves state-of-the-art performances without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.
AB - Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations-such as custom hard-negative mining and denoising-resources rarely available in specialized domains. We propose TECHNIQUERAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. First, our approach mitigates data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing resource-intensive retrieval training. Second, although conventional RAG mitigates hallucination by coupling retrieval and generation, its dependence on generic retrievers often introduces noisy candidates, thereby limiting domain-specific precision. To address, we enhance the retrieval quality and domain specificity through a zero-shot LLM re-ranking that explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TECHNIQUERAG achieves state-of-the-art performances without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.
UR - https://www.scopus.com/pages/publications/105028604397
U2 - 10.18653/v1/2025.findings-acl.1076
DO - 10.18653/v1/2025.findings-acl.1076
M3 - Conference contribution
AN - SCOPUS:105028604397
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 20913
EP - 20926
BT - Findings of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
Y2 - 27 July 2025 through 1 August 2025
ER -