Comparative Study of Machine Learning Algorithms for Heart Disease Prediction
Acharya, Abhisek (2017)
Acharya, Abhisek
Metropolia Ammattikorkeakoulu
2017
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-201704104507
https://urn.fi/URN:NBN:fi:amk-201704104507
Tiivistelmä
As technology and hardware capability advance machine learning is also advancing and the use of it is growing in every field from stock analysis to medical image processing. Heart disease prediction is one of the fields where machine learning can be implemented. Therefore, this study investigates the different machine learning algorithms and compares the results using different performance metrics i.e., accuracy, precision, recall, f1-score etc. The dataset used for this study was taken from UCI machine learning repository, titled “Heart Disease Data Set”.
This study was executed as a quantitative case study and several previous research on this data set was studied and analysed deeply to understand the subject in greater depth. Statistics and numbers are widely used in the study, so that the study can be quantifiable and several correlations between data attributes can be found.
The main objective of this study was to compare the algorithms which can classify the heart disease correctly based on different performance metrics. There are 13 dependent variables in the data set and 1 independent variable to be predicted. The original data set contains predicted variables from 0 to 4 representing a healthy heart starting from 0 to severely unhealthy heart at 4. For this study, 0 to 4 class labels were changed to 0 and 1. The predicted class can be either 0 or 1, meaning the heart is either 0 (“Healthy”) or 1 (“Unhealthy”). Techniques such as feature selection, grid search and probability calibration were used to get the optimal results.
In this study, algorithms such as k-Nearest Neighbour, Support Vector Machine, Naïve Bayes, Adaboost, Random Forest and Artificial Neural Network are used. It can be concluded that Artificial Neural Network and Support Vector Machine are best the algorithms for this data set and possibly other heart disease data sets. For the proper conclusion for this study to be applied clinically, it needs to be further elaborated with the help of experts in both heart and machine leaning domains.
This study was executed as a quantitative case study and several previous research on this data set was studied and analysed deeply to understand the subject in greater depth. Statistics and numbers are widely used in the study, so that the study can be quantifiable and several correlations between data attributes can be found.
The main objective of this study was to compare the algorithms which can classify the heart disease correctly based on different performance metrics. There are 13 dependent variables in the data set and 1 independent variable to be predicted. The original data set contains predicted variables from 0 to 4 representing a healthy heart starting from 0 to severely unhealthy heart at 4. For this study, 0 to 4 class labels were changed to 0 and 1. The predicted class can be either 0 or 1, meaning the heart is either 0 (“Healthy”) or 1 (“Unhealthy”). Techniques such as feature selection, grid search and probability calibration were used to get the optimal results.
In this study, algorithms such as k-Nearest Neighbour, Support Vector Machine, Naïve Bayes, Adaboost, Random Forest and Artificial Neural Network are used. It can be concluded that Artificial Neural Network and Support Vector Machine are best the algorithms for this data set and possibly other heart disease data sets. For the proper conclusion for this study to be applied clinically, it needs to be further elaborated with the help of experts in both heart and machine leaning domains.