Beyond Native LLMs Outputs : Exploring Retrieval-Augmented Generation and Evaluation Methods
Alakulju Dkhili, Moufida (2025)
Alakulju Dkhili, Moufida
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202505028631
https://urn.fi/URN:NBN:fi:amk-202505028631
Tiivistelmä
Large Language Models (LLMs) have revolutionized natural language processing, demonstrating remarkable capabilities in various linguistic tasks. However, their limitations in contextual understanding, potential for bias, and lack of reliable validation methods pose significant challenges, particularly in specialized applications.
This thesis explores techniques to enhance the reliability of LLM-based systems and proposes a framework for output validation. The study begins with a comprehensive overview of LLMs, tracing their evolution, examining their architecture, and identifying key limitations. Building on this foundation, it investigates Retrieval Augmented Generation (RAG) systems as a mechanism to improve the contextual accuracy of LLMs. Through a comparative study, it demonstrates the practical benefits of RAG in overcoming some of the inherent limitations of traditional LLMs.
A contribution of this study is the development of a validation framework for assessing the reliability and expected value of LLM outputs. This framework is designed to be quantitative, aligned with human judgment, and implementable in code, addressing a gap in current evaluation methodologies. It compares this approach with traditional NLP evaluation techniques, highlighting its advantages in the context of modern LLM applications.
The findings suggest that the combination of RAG systems and the proposed validation framework can significantly enhance the reliability of LLM-based applications in specialized use cases contributing to the growing body of knowledge on LLM optimization and evaluation, offering practical solutions for measuring the trustworthiness of AI-generated language outputs.
This thesis explores techniques to enhance the reliability of LLM-based systems and proposes a framework for output validation. The study begins with a comprehensive overview of LLMs, tracing their evolution, examining their architecture, and identifying key limitations. Building on this foundation, it investigates Retrieval Augmented Generation (RAG) systems as a mechanism to improve the contextual accuracy of LLMs. Through a comparative study, it demonstrates the practical benefits of RAG in overcoming some of the inherent limitations of traditional LLMs.
A contribution of this study is the development of a validation framework for assessing the reliability and expected value of LLM outputs. This framework is designed to be quantitative, aligned with human judgment, and implementable in code, addressing a gap in current evaluation methodologies. It compares this approach with traditional NLP evaluation techniques, highlighting its advantages in the context of modern LLM applications.
The findings suggest that the combination of RAG systems and the proposed validation framework can significantly enhance the reliability of LLM-based applications in specialized use cases contributing to the growing body of knowledge on LLM optimization and evaluation, offering practical solutions for measuring the trustworthiness of AI-generated language outputs.