Production Database Preprocessing: Transforming messy data into actionable insights
Starck, Alex (2023)
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023052714975
https://urn.fi/URN:NBN:fi:amk-2023052714975
Tiivistelmä
The project aims to demonstrate the importance of data preprocessing in developing an accurate predictive data model. The project utilizes a private dataset and focuses on cleaning, transforming, and preparing the data for use in a predictive model.
The project involves the use of various data preprocessing techniques such as handling missing values, scaling, and duplicates. The project then employs a popular machine learning algorithm called linear regression to build a simple predictive model and evaluate its performance.
The method used in this project is a quantitative research approach that draws upon deductive reasoning. The study relies on a systematic and logical approach to draw conclusions from the collected quantitative data.
The linear regression model implemented using scikit-learn did not provide satisfactory results due to the small size of the dataset, which limited the accuracy of predictions. Although the model provided some insights, the results were not accurate enough to draw any meaningful conclusions, and rounding the results did not provide reliable insights.
The project involves the use of various data preprocessing techniques such as handling missing values, scaling, and duplicates. The project then employs a popular machine learning algorithm called linear regression to build a simple predictive model and evaluate its performance.
The method used in this project is a quantitative research approach that draws upon deductive reasoning. The study relies on a systematic and logical approach to draw conclusions from the collected quantitative data.
The linear regression model implemented using scikit-learn did not provide satisfactory results due to the small size of the dataset, which limited the accuracy of predictions. Although the model provided some insights, the results were not accurate enough to draw any meaningful conclusions, and rounding the results did not provide reliable insights.
