Unlocking unstructured data: integrating Large Language Models into AI-Powered ETL
Li, Yinan; Virtanen, Ilpo (2025)
Li, Yinan
Virtanen, Ilpo
Oulun ammattikorkeakoulu
2025
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2025061770809
https://urn.fi/URN:NBN:fi-fe2025061770809
Tiivistelmä
Collecting information from unstructured sources is getting more important since decisions are expected to be data-driven and a significant amount of necessary information is stored in unorganized text documents. However, automatically extracting information from professional free-text documents is challenging, as it requires a professional’s ability to flexibly identify key information that could not be identified before the emergence of LLM. Recently, there has been an increase in ETL systems designed with LLM technology, providing solutions to this issue in many areas. The LLM technology has proven to be an efficient method that can be integrated into the ETL process to improve the accuracy and completeness of data extraction, as well as data cleaning, without raising the cost and reducing accuracy.