From requirements to test cases : evaluation of AI-generated test case descriptions
Mustalahti, Henriikka (2025)
Mustalahti, Henriikka
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025060520730
https://urn.fi/URN:NBN:fi:amk-2025060520730
Tiivistelmä
This thesis investigates the potential of generative AI tools to enhance test case planning for security requirements in the defence industry. Focusing on requirements concerning transport layer security (TLS) certificates, mutual authentication, and public key infrastructure (PKI), the research evaluates the generative AI’s performance against more traditional methods, with emphasis on accuracy and efficiency. The tool used in the thesis is an internal, on-premise generative AI tool still in early development.
The findings suggest that generative AI could increase the efficiency of test case planning, particularly when provided with well-structured requirements. However, the study does show the importance of careful prompt engineering, as generative AI performance can be sensitive to the way questions are phrased. While generative AI also demonstrates promise in identifying potential test cases that might be overlooked by human testers, its vulnerability to hallucinations and limitations in handling complex or ambiguous requirements highlight the ongoing need for human evaluation of the produced output.
In addition, the tool’s on-premise deployment offers advantages in data security and confidentiality compared to cloud-based solutions.
Future research could explore integrating the tool into automated testing frameworks, evaluating its performance with more diverse and complex requirements, and trying out different prompting techniques to identify which ones give the best results. As the tool is continuously upgraded and updated, further research will likely show vast improvements in the reliability of its output.
The findings suggest that generative AI could increase the efficiency of test case planning, particularly when provided with well-structured requirements. However, the study does show the importance of careful prompt engineering, as generative AI performance can be sensitive to the way questions are phrased. While generative AI also demonstrates promise in identifying potential test cases that might be overlooked by human testers, its vulnerability to hallucinations and limitations in handling complex or ambiguous requirements highlight the ongoing need for human evaluation of the produced output.
In addition, the tool’s on-premise deployment offers advantages in data security and confidentiality compared to cloud-based solutions.
Future research could explore integrating the tool into automated testing frameworks, evaluating its performance with more diverse and complex requirements, and trying out different prompting techniques to identify which ones give the best results. As the tool is continuously upgraded and updated, further research will likely show vast improvements in the reliability of its output.