Using LLMs to Improve the Accuracy of SBOM-Based Vulnerability Assessment
Nguyen, Van Huy (2025)
Nguyen, Van Huy
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025121737614
https://urn.fi/URN:NBN:fi:amk-2025121737614
Tiivistelmä
Software supply chain security has become a critical challenge as modern software increasingly depends on open-source components and third-party libraries. To improve transparency and risk management, organizations have adopted Software Bills of Materials (SBOMs), which list all components and enable vulnerability tracking across the supply chain. However, current Software Composition Analysis (SCA) and SBOM scanners frequently generate findings that rely primarily on component’s version matching and lack contextual understanding of how components are used within an application. As a result, they often report vulnerabilities that do not affect the software, leading to alert fatigue and inefficient use of security resources. Developers and security teams must then manually investigate these findings to determine whether they pose a real risk, consuming significant time and slowing the overall remediation process.
This thesis investigates the use of Large Language Models to enhance SBOM workflows by analyzing whether vulnerable functions or dangerous syntax referenced in CVEs are truly present and used in application source code. The goal is to improve the accuracy of vulnerability triage, reduce unnecessary investigation effort, and support more efficient and effective secure DevOps practices. The study is motivated by industry needs and contributes to ongoing innovation in cybersecurity automation. The experimental evaluation across a diverse set of real-world projects demonstrated the profound impact of this approach. This massive reduction in noise translates directly into significant operational benefits, time savings and enhancing developer confidence in security tools, and strengthening compliance with emerging software transparency regulations.
This thesis investigates the use of Large Language Models to enhance SBOM workflows by analyzing whether vulnerable functions or dangerous syntax referenced in CVEs are truly present and used in application source code. The goal is to improve the accuracy of vulnerability triage, reduce unnecessary investigation effort, and support more efficient and effective secure DevOps practices. The study is motivated by industry needs and contributes to ongoing innovation in cybersecurity automation. The experimental evaluation across a diverse set of real-world projects demonstrated the profound impact of this approach. This massive reduction in noise translates directly into significant operational benefits, time savings and enhancing developer confidence in security tools, and strengthening compliance with emerging software transparency regulations.
