Reproducible Machine Learning Models and Experiments : A platform for hosting and managing machine learning projects
Wallin, Niklas (2023)
Wallin, Niklas
2023
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023121136096
https://urn.fi/URN:NBN:fi:amk-2023121136096
Tiivistelmä
This thesis concerns finding suitable open-source tools and a platform for managing and hosting machine learning projects. The platform should keep track of the different versions used of the code, datasets, and hyperparameters during the project's life cycle, by using version control tools. The platform should also be able to keep track of the machine learning experiments made when developing a model. By keeping track of what was used for creating a model, it is possible to reproduce older experiments and machine learning models.
The thesis is also about creating a pipeline to run the experiments and create machine learning models. A demo task is included in the thesis, where the pipeline was able to fetch data from the web or use the current data on the PC to create and validate a machine learning model. Finally, the model created is deployed to a website, where it can be tested.
Selected models with good performance metrics can be added to a model registry. This is to better keep track of the state of the selected models e.g., which models should be further tested and which are ready to be deployed to production.
The thesis is also about creating a pipeline to run the experiments and create machine learning models. A demo task is included in the thesis, where the pipeline was able to fetch data from the web or use the current data on the PC to create and validate a machine learning model. Finally, the model created is deployed to a website, where it can be tested.
Selected models with good performance metrics can be added to a model registry. This is to better keep track of the state of the selected models e.g., which models should be further tested and which are ready to be deployed to production.