Automated LLM Validation for Enterprise SaaS : A Hybrid Validation Framework for QA
Shenvi Kakodkar, Sweta (2025)
Shenvi Kakodkar, Sweta
2025
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025120231641
https://urn.fi/URN:NBN:fi:amk-2025120231641
Tiivistelmä
The emergence of Large Language Models (LLMs) such as OpenAI’s GPT series has redefined user interaction paradigms within enterprise software. While their capabilities offer significant value in automating tasks like summarization, insight generation, and natural language querying, their integration into business-critical Software-as-a-Service (SaaS) platforms poses new risks, namely, hallucinations, inconsistency, and lack of explainability. This thesis investigates the development and application of an automated response validation framework designed to ensure the reliability, accuracy, and compliance-readiness of LLM outputs in a B2B SaaS environment.
Focusing on a case study at The Case Company, a leading M&A software provider, the research explores a hybrid validation approach that combines semantic similarity metrics, structural checks, factual grounding via retrieval-augmented generation (RAG), and customizable logic for business-specific quality gates. A set of task-specific validators was implemented to evaluate LLM outputs for summarization, comparison, and question-answering use cases. The framework was evaluated for usability, effectiveness, and integration readiness across agile product teams.
Results indicate that domain-specific, automated validators can significantly improve the trustworthiness of LLM responses and enhance developer productivity. Moreover, the proposed system aligns with forthcoming AI governance standards such as the EU AI Act. This thesis contributes a novel, practical, and extensible quality assurance methodology for LLM-infused software systems, bridging the gap between experimental AI capabilities and enterprise-grade product requirements.
Focusing on a case study at The Case Company, a leading M&A software provider, the research explores a hybrid validation approach that combines semantic similarity metrics, structural checks, factual grounding via retrieval-augmented generation (RAG), and customizable logic for business-specific quality gates. A set of task-specific validators was implemented to evaluate LLM outputs for summarization, comparison, and question-answering use cases. The framework was evaluated for usability, effectiveness, and integration readiness across agile product teams.
Results indicate that domain-specific, automated validators can significantly improve the trustworthiness of LLM responses and enhance developer productivity. Moreover, the proposed system aligns with forthcoming AI governance standards such as the EU AI Act. This thesis contributes a novel, practical, and extensible quality assurance methodology for LLM-infused software systems, bridging the gap between experimental AI capabilities and enterprise-grade product requirements.
