Using Open Source LLM Model for Medical Transcription
Chowdhury, Mohammed Nowshad Ruhani (2025)
Chowdhury, Mohammed Nowshad Ruhani
2025
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025052716517
https://urn.fi/URN:NBN:fi:amk-2025052716517
Tiivistelmä
In modern healthcare, clinical documentation is paramount for patient safety, accurate diagnoses, and continuity of care. However, physician burnout has been caused by the increasing overhead of electronic health record (EHR) systems, which take up less time for real human interaction. In less-resourced languages such as Finnish, in which natural language processing (NLP) tools are only beginning to emerge, this is an even bigger challenge. This thesis investigates the fine-tuning of the open-source LLaMA 3.1–8B language model on simulated Finnish clinical conversations that is, transcribed clinical dialogues created by Metropolia UAS students. The aim is to verify if a domain- aligned large language model (LLM) is able to reliably translate spoken Finnish medical discourse into formal clinical reports. With 7-fold cross-validation, the fine-tuned model achieved a BLEU score of 0.1242, ROUGE-L score of 0.4982, and BERTScore F1 score of 0.8373, showing satisfactory semantic performance using a small dataset and scalability of privacy-oriented NLP tools in Finnish medical environments.