Hyppää sisältöön
    • Suomeksi
    • På svenska
    • In English
  • Suomi
  • Svenska
  • English
  • Kirjaudu
Hakuohjeet
JavaScript is disabled for your browser. Some features of this site may not work without it.
Näytä viite 
  •   Ammattikorkeakoulut
  • Metropolia Ammattikorkeakoulu
  • Opinnäytetyöt
  • Näytä viite
  •   Ammattikorkeakoulut
  • Metropolia Ammattikorkeakoulu
  • Opinnäytetyöt
  • Näytä viite

Comparing Natural Language Models for Software Category Classification

Turbin, Ivan (2023)

 
Avaa tiedosto
Turbin_Ivan.pdf (1.247Mt)
Lataukset: 


Turbin, Ivan
2023
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023112732000
Tiivistelmä
The purpose of this thesis is to compare natural language machine learning models to find classification differences in the software category classification field. Software category classification is a text classification task designed to find the appropriate software category based on its description. The objective in this thesis is to explain fundamental machine learning principals such as data augmentation, normalization, analysing performance and explaining common natural language models.

To achieve the goals it is necessary to obtain trainable data, normalize gathered data and build a model suitable for text classification. In the present study Microsoft and Cnet software stores are used as data sources. The categories and descriptions are gathered using a Python scraper with Beautiful Soup library which is ran targeting the software stores.

With the gathered data CNN, RNN and BERT text classification models were constructed and compared with one another. The comparison of the models was done by using machine learning performance metrics such as precision, recall, loss, accuracy, classification time and confusion matrix.

The findings showed that CNN is the optimal model for text classification given the gathered dataset. BERT model showed promising results, however due to the model being very large overfitting could be a potential problem. The performance can be further improved by finetuning the parameters and increasing the dataset size. Other methods of software classification could be applied to increase the accuracy of classification, such as image recognition of the program user interface.
Kokoelmat
  • Opinnäytetyöt
Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste
 

Selaa kokoelmaa

NimekkeetTekijätJulkaisuajatKoulutusalatAsiasanatUusimmatKokoelmat

Henkilökunnalle

Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste