Using machine learning to predict purchase potential from customer data
Merikanto, Kari (2022)
Merikanto, Kari
2022
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202205067540
https://urn.fi/URN:NBN:fi:amk-202205067540
Tiivistelmä
This thesis was commissioned by an accounting firm company which sells consultancy services for their customers. The company wanted to gain an understanding of which customer attributes affected the purchase decisions, how machine learning could help in the sales situation when these consultation service products are offered and is the machine learning a good tool for this kind of use case.
This study was conducted for two suitable consultation service products called minutes and finance interpretation and separate machine learning models were created for these products using LightGBM (Light Gradient Boosting Machine). Historical customer data was collected from a one-year period which represented a time frame when the products should be offered and needed. After creating the machine learning models, SHAP (SHapley Additive exPlanations) values were used to explain the effects of the customer attributes for purchases. Model predictions and SHAP values were also used later in a proof-of-concept web application which has a web page to test out predictions and SHAP values using recent customer data.
Minutes product model had a good balance between cases in which purchase was or was not made in its dataset. This model had almost 80% accuracy for predicting if the customer purchases the product or not, but it was only appropriate when predicting positive purchase decisions. Using SHAP values, some indicators were gained on what customer attributes could affect purchase decisions. Minutes product model was also used in the web application and in a test in which it was used in recent sales situations. This test led to a conclusion that minutes product model was appropriate when predicting negative outcomes (customer will not purchase the product) but was insufficient in predicting positive outcomes (customer will purchases the product) which would be beneficial especially in the sales situation.
Finance interpretation product dataset was considerably more imbalanced than minutes product dataset. The accuracy of the model was high (around 90%), but this high accuracy was a result from the imbalance in the dataset. Finance interpretation product model was very poor at predicting positive outcomes and therefore it was only used to get global insights about the reasons for purchases using SHAP values
This study was conducted for two suitable consultation service products called minutes and finance interpretation and separate machine learning models were created for these products using LightGBM (Light Gradient Boosting Machine). Historical customer data was collected from a one-year period which represented a time frame when the products should be offered and needed. After creating the machine learning models, SHAP (SHapley Additive exPlanations) values were used to explain the effects of the customer attributes for purchases. Model predictions and SHAP values were also used later in a proof-of-concept web application which has a web page to test out predictions and SHAP values using recent customer data.
Minutes product model had a good balance between cases in which purchase was or was not made in its dataset. This model had almost 80% accuracy for predicting if the customer purchases the product or not, but it was only appropriate when predicting positive purchase decisions. Using SHAP values, some indicators were gained on what customer attributes could affect purchase decisions. Minutes product model was also used in the web application and in a test in which it was used in recent sales situations. This test led to a conclusion that minutes product model was appropriate when predicting negative outcomes (customer will not purchase the product) but was insufficient in predicting positive outcomes (customer will purchases the product) which would be beneficial especially in the sales situation.
Finance interpretation product dataset was considerably more imbalanced than minutes product dataset. The accuracy of the model was high (around 90%), but this high accuracy was a result from the imbalance in the dataset. Finance interpretation product model was very poor at predicting positive outcomes and therefore it was only used to get global insights about the reasons for purchases using SHAP values