AI tool for scientific literature data extraction

Ronkainen, Justiina

AI tool for scientific literature data extraction

Ronkainen, Justiina (2025)

Avaa tiedosto

Ronkainen_Justiina.pdf (3.308Mt)

Lataukset:

Ronkainen, Justiina

2025

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025060420152

Tiivistelmä

The aim of this project was to explore artificial intelligence (AI) by developing a tool that leverages large language models (LLMs) to extract structured information from scientific articles. Systematic literature reviews and meta-analyses are based on accurate data extraction, a process that is traditionally manual, time-consuming and prone to mistakes. This project investigated the potential of LLMs to automate and streamline this task. Focus was to extract the key study elements such as article metadata, study design, statistical methods and results.

The project was implemented using Python’s LangChain framework and OpenAI API. Key techniques included prompt engineering, text processing and chunking to adapt the content suitable for LLM. GPT-3.5 and GPT-4.1 models were tested and evaluated against each other and human-extracted gold standard to assess performance. The model demonstrated potential in extracting some information, such as article metadata and study design; however, it struggled to reliably extract all relevant results from the articles.

Despite current limitations, LLMs hold promise for automating aspects of scientific data extraction. Improvements such as section-specific extraction and potential fine-tuning may enhance the performance and offer a scalable solution to one of the most labour-intensive steps in systematic literature review.

Kokoelmat

Opinnäytetyöt (Avoin kokoelma)