Evaluating the Capabilities and Limitations of Large Language Models in Code Vulnerability Detection

  • Euisuh Jeong

Student thesis: Master's Dissertation

Abstract

The detection of software vulnerabilities is critical for maintaining secure systems, and recent advances in large language models (LLMs) have shown promise in various code-related tasks. This study evaluates the effectiveness of LLMs in vulnerability detection using the PRIMEVUL dataset, which addresses limitations in previous datasets by offering rigorous labeling and reduced duplication. We conducted extensive experiments with various LLMs, including code-specific models and instruction-tuned variants, assessing their ability to detect vulnerabilities associated with specific Common Weakness Enumerations (CWEs). Our findings indicate that, even after fine-tuning, LLMs did not achieve the level of precision and recall necessary for reliable vulnerability detection. The models exhibited varying performance across different types of vulnerabilities, and their detection capabilities plateaued after initial fine-tuning epochs. We also identified limitations within the dataset, such as potential data leakage due to its public availability and inaccuracies in vulnerability labeling, which may have affected the models’ performance. These results suggest that treating LLMs as black boxes for vulnerability detection has significant limitations, regardless of their sophistication. We propose that future research should explore integrating LLMs with dynamic or static code analysis tools to enhance detection capabilities. Additionally, we emphasize the need for better dataset curation and human annotation to improve label accuracy and, consequently, the effectiveness of vulnerability detection models.
Date of Award2024
Original languageAmerican English
Awarding Institution
  • HBKU College of Science and Engineering

Keywords

  • None

Cite this

'