Comparing language models for summarizing intellectual property (IP) texts
Ylikojola, Lassi (2025)
Ylikojola, Lassi
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025121134935
https://urn.fi/URN:NBN:fi:amk-2025121134935
Tiivistelmä
With the increase of readily available language models (LM) there is an increasing interest for organizations to systematically evaluate model performance and resource consumption for specific tasks. One critical area is the summarization of intellectual property texts, including patents, copyrights, and similar documentation.
This thesis investigates the evaluation requirements for employing LMs in IP text summarization tasks and the necessity of benchmarking generic models against models specifically tuned for summarizing domain texts.
In this thesis, we concentrate on baseline quantitative metrics.
We are doing initial assessments of LM suitability, detailing metrics such as computational resource usage (CPU, GPU, memory requirements), the degree of hallucination in outputs, language support and proficiency across both tuned and untuned languages.
These basic metrics help organizations identify initial investment levels required, understand the scalability and reliability of various models, and guide further model tuning and customization.
The outcome demonstrates the ability of quantitative metrics to enhance decision making for further investment for model usage.
We quantitatively evaluate three models. One generic and two tuned models. This data will be used to define if model is basically usable for defined text summarization task before further investment from organization.
We will take a survey of metrics available and model tuning possibilities.
This thesis investigates the evaluation requirements for employing LMs in IP text summarization tasks and the necessity of benchmarking generic models against models specifically tuned for summarizing domain texts.
In this thesis, we concentrate on baseline quantitative metrics.
We are doing initial assessments of LM suitability, detailing metrics such as computational resource usage (CPU, GPU, memory requirements), the degree of hallucination in outputs, language support and proficiency across both tuned and untuned languages.
These basic metrics help organizations identify initial investment levels required, understand the scalability and reliability of various models, and guide further model tuning and customization.
The outcome demonstrates the ability of quantitative metrics to enhance decision making for further investment for model usage.
We quantitatively evaluate three models. One generic and two tuned models. This data will be used to define if model is basically usable for defined text summarization task before further investment from organization.
We will take a survey of metrics available and model tuning possibilities.
