Speech-to-Text Implementation in a Mobile Application for Finnish Language
Njoku, Chijoke Nelson (2025)
Njoku, Chijoke Nelson
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025121636948
https://urn.fi/URN:NBN:fi:amk-2025121636948
Tiivistelmä
This thesis details the development, execution and assessment of a speech-to-text (STT) system specifically created for the Finnish language in a mobile application context. The study tackles the technical hurdles inherent in the Finnish language because of its word formation and the relatively scarce training data accessible compared to languages such as English.
The task entailed developing a working application with Flutter linked to a backend capable of processing audio via various speech-to-text models. This thesis evaluates both cloud platforms such as Google, Azure and OpenAI well as local offline models to assess their performance with Finnish speech covering words, with numerous suffixes, common abbreviations and multiple accents.
The cloud platforms generated precise outcomes. They required a few seconds to convert brief audio clips into text. The offline models were functional though they operated slowly unless supported by powerful hardware. Despite these distinctions one finding was notable. Contemporary STT models manage Finnish well even when speakers alternate between Finnish and another language blending bilingual expressions within a single sentence.
The project produced a prototype demonstrating that Finnish speech recognition can be achieved on mobile devices. It also illustrates the potential for this technology to expand. It can facilitate hands-on applications in fields such as agriculture allowing users to issue straightforward voice commands in Finnish during their tasks.
The task entailed developing a working application with Flutter linked to a backend capable of processing audio via various speech-to-text models. This thesis evaluates both cloud platforms such as Google, Azure and OpenAI well as local offline models to assess their performance with Finnish speech covering words, with numerous suffixes, common abbreviations and multiple accents.
The cloud platforms generated precise outcomes. They required a few seconds to convert brief audio clips into text. The offline models were functional though they operated slowly unless supported by powerful hardware. Despite these distinctions one finding was notable. Contemporary STT models manage Finnish well even when speakers alternate between Finnish and another language blending bilingual expressions within a single sentence.
The project produced a prototype demonstrating that Finnish speech recognition can be achieved on mobile devices. It also illustrates the potential for this technology to expand. It can facilitate hands-on applications in fields such as agriculture allowing users to issue straightforward voice commands in Finnish during their tasks.
