Hyppää sisältöön
    • Suomeksi
    • På svenska
    • In English
  • Suomi
  • Svenska
  • English
  • Kirjaudu
Hakuohjeet
JavaScript is disabled for your browser. Some features of this site may not work without it.
Näytä viite 
  •   Ammattikorkeakoulut
  • Jyväskylän ammattikorkeakoulu
  • Opinnäytetyöt (Avoin kokoelma)
  • Näytä viite
  •   Ammattikorkeakoulut
  • Jyväskylän ammattikorkeakoulu
  • Opinnäytetyöt (Avoin kokoelma)
  • Näytä viite

Influence of input design and preprocessing on Finnish language AI summarisation

Roivas, Onni (2026)

 
Avaa tiedosto
Roivas_Onni.pdf (4.658Mt)
Lataukset: 


Roivas, Onni
2026
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202601231701
Tiivistelmä
Finnish language news summarisation was increasingly facilitated by large language models, but summary quality was observed to depend on how source articles were presented as model input. In particular, different levels of preprocessing and task framing were expected to affect readability and content selection under strict length constraints. The objective was to evaluate how input variants influenced summary clarity and coverage in Finnish and to compare model behaviour under identical instructions.

An experimental summarisation setup was implemented using five Finnish news articles. Three input representations were compared: raw text, cleaned text produced with a preprocessing pipeline and a structured input format designed to separate the source text in different parts. Three models were evaluated: Gemma-3-27B-IT, Llama-Poro-2-70B-Instruct and Viking-7B. A fixed summarisation instruction and a target length of two sentences were applied in all conditions. Summary quality was assessed using manual ratings for clarity and coverage on a five-point scale and was supported by qualitative inspection of representative outputs and error cases.

A consistent trade-off between readability and informativeness was observed. Higher clarity was associated with raw input, whereas higher coverage was associated with structured input. The cleaned input variant was found to perform weakest in both criteria in the evaluated setting. At the model level, the most coherent and natural Finnish summaries were produced by Gemma-3-27B-IT, while production-usable outputs were also produced by Llama-Poro-2-70B-Instruct, although denser sentence packing was sometimes observed. Unstable behaviour was observed in Viking-7B, including repetition, truncation and factual errors. Structured input variant was therefore indicated as a practical method for improving content retention in Finnish news summarisation, while aggressive text normalisation was indicated as potentially counterproductive under a strict two-sentence constraint.
Kokoelmat
  • Opinnäytetyöt (Avoin kokoelma)
Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste
 

Selaa kokoelmaa

NimekkeetTekijätJulkaisuajatKoulutusalatAsiasanatUusimmatKokoelmat

Henkilökunnalle

Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste