Domain Specific Data Quality Framework
Raja, Komal (2024)
Raja, Komal
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024061122771
https://urn.fi/URN:NBN:fi:amk-2024061122771
Tiivistelmä
Organizations are generating large volumes of data, including structured and unstructured types from various sources. Financial institutions, which rely heavily on accurate data, face unique challenges that require more knowledge related to the domain. This thesis develops a framework using a Retrieval-Augmented Generation (RAG) model, trained on financial data quality issues and solutions, to provide a context-aware data validation approach.
The research aims to bridge the gap between generic data quality checks and context aware approach. By comparing the RAG model's performance with traditional data quality management tools, this study evaluates its effectiveness based on metrics like timeliness, consistency, and completeness.
The findings suggest that integrating domain-specific knowledge into data validation processes can provide relevant information and can be utilized in financial data quality management. Future work will focus on improving and implementing the framework’s scalability and performance through using more relevant data and real-world testing.
The research aims to bridge the gap between generic data quality checks and context aware approach. By comparing the RAG model's performance with traditional data quality management tools, this study evaluates its effectiveness based on metrics like timeliness, consistency, and completeness.
The findings suggest that integrating domain-specific knowledge into data validation processes can provide relevant information and can be utilized in financial data quality management. Future work will focus on improving and implementing the framework’s scalability and performance through using more relevant data and real-world testing.