Developing an evaluation framework for AI chatbot responses with source material alignment metrics

Kalburgi, Prithviraj

Developing an evaluation framework for AI chatbot responses with source material alignment metrics

Kalburgi, Prithviraj (2025)

Avaa tiedosto

Kalburgi_Prithviraj.pdf (1.189Mt)

Lataukset:

Kalburgi, Prithviraj

2025

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025060319568

Tiivistelmä

This thesis presents the development of an evaluation framework designed to assess the quality of AI-generated responses within a domain-specific chatbot system. The goal of this evaluation is to measure how effectively the chatbot retrieves and utilizes relevant information from a vector database to respond to user queries.

The methodology involves constructing a modular evaluation pipeline which analyses responses based on three key aspects: accuracy, relevance, and factual consistency (hallucination detection). The framework integrates tools such as Sentence Transformers for embedding generation and uses popular machine learning evaluation metrics such as cosine similarity, BLEU, ROUGE, and BERT Score for deep linguistic analysis. MongoDB is used to store and manage user prompts, AI responses, and retrieved data, while the metrics’ results evaluate the chatbot’s response. Sample evaluations were conducted on machine manual data from the “Avant-e527” dataset.

The findings from the evaluation highlight both strengths and weaknesses in AI-generated responses, providing insight into where responses may divert from reference data or lack contextual relevance. While the framework remains lightweight compared to more advanced techniques such as fine-tuned large language model-based evaluation, it provides a practical and understandable foundation for measuring chatbot performance and suggests potential future advancements.

Kokoelmat

Opinnäytetyöt (Avoin kokoelma)