Data Survey for Elonen Oy Bakery: applying clustering to sales data analysis
Hakala, Taina (2025)
Hakala, Taina
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025052315446
https://urn.fi/URN:NBN:fi:amk-2025052315446
Tiivistelmä
The background of this work was Elonen Oy Bakery's desire to better utilize collected sales and other data. The task was to investigate the available data and discuss with industry experts the best potential methods to use. The goal was to find a perspective that could be further developed and to create a machine learning model benefiting the business. Initially, various ideas were developed and presented to the company's representatives. Ultimately, the main topic became the application of clustering to analyze sales data.
The primary goal was to cluster customers in the product group space and products in the area space, with other configurations also investigated. A thorough examination of K-Means and agglomerative clustering, their differences, and their application to Elonen's data was conducted. Skewness management of distributions and PCA and UMAP dimensionality reduction methods were also used.
The result was a consistent clustering outcome for each dataset created from the sales data. While some data points may shift between clusters in different runs, the overall characterization of the clusters remains stable. Additionally, clusters were examined in relation to the natural categories of the data points. To improve usability, interactive scatter plots were created, making it easy to explore the data points and their relationships.
It was found that multiple clustering configurations can be created from the same initial data, describing the data from different perspectives and providing insights into various matters of interest. In conclusion, clustering is particularly useful when someone unfamiliar with the industry wants to quickly find connections and patterns in the data, which would be slow and random to find using traditional data analysis methods.
The primary goal was to cluster customers in the product group space and products in the area space, with other configurations also investigated. A thorough examination of K-Means and agglomerative clustering, their differences, and their application to Elonen's data was conducted. Skewness management of distributions and PCA and UMAP dimensionality reduction methods were also used.
The result was a consistent clustering outcome for each dataset created from the sales data. While some data points may shift between clusters in different runs, the overall characterization of the clusters remains stable. Additionally, clusters were examined in relation to the natural categories of the data points. To improve usability, interactive scatter plots were created, making it easy to explore the data points and their relationships.
It was found that multiple clustering configurations can be created from the same initial data, describing the data from different perspectives and providing insights into various matters of interest. In conclusion, clustering is particularly useful when someone unfamiliar with the industry wants to quickly find connections and patterns in the data, which would be slow and random to find using traditional data analysis methods.
Kokoelmat
Samankaltainen aineisto
Näytetään aineisto, joilla on samankaltaisia nimekkeitä, tekijöitä tai asiasanoja.
-
Data Strategy Handbook as Guide Towards Data-Driven Organization
Piippola, Timo-Joel (2024)The need for an organizational data culture is evident in the digital era. More organizations are making data-driven decisions, viewing data as a crucial business asset. This thesis aimed to help a case company enhance its ... -
Big datan käyttö liiketoiminnan ennustamiseen: tieliikenneonnettomuudet Suomessa
Alto, Olga (2019)Tämän opinnäytetyön tarkoituksena on selvittää, mitä tietoja voidaan ennustaa suurista tietomääristä. Aineistona on käytetty Suomessa liikennetapaturmia koskevia avoimia lähteitä vuosilta 2015 – 2017. Työssä ennustetaan ... -
Recognizing the value of data in business operations : Data analytics for business operation
Duma, Don (2022)The aim of this study was to demonstrate the hidden value of data that can be extracted with few commercial and open-source software tools. Any given business can collect, organize, and extract data for analysis that can ...