Predicting Voting Affiliation Using Machine Learning Algorithms

Wakjira, Ashenafi

Predicting Voting Affiliation Using Machine Learning Algorithms

Wakjira, Ashenafi (2014)

Avaa tiedosto

WAKJIRA_ASHENAFI.pdf (596.1Kt)

Lataukset:

Wakjira, Ashenafi

Metropolia Ammattikorkeakoulu

2014

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-201405259856

Tiivistelmä

Human beings are brave enough to read, understand and draw conclusions in many areas. When it comes to complex, multidimensional, junky data, decision making becomes difficult, time consuming, erroneous and even impossible. When it comes to such data, machine learning algorithms are of great importance in making decisions.

The main goal of the project was to classify the party affiliation of U.S congressmen as Democrats and Republican based on the dataset on the UCI website under the title congressional voting records dataset. The dataset is collected from 435 U.S House of Representatives Congressmen. The data collected is based on 16 key votes. The details for the votes are simplified as questionnaire answers and represented as “yes”, “no” and “un-known” or not answered on the data set. In the dataset, the row of the data represents the Congressmen, the first column is the label of the class (Democrat/ Republican) and the rest of the column is voting data (“Yes”, “No” and “Unknown”).

The purpose of the project was to predict the class of new data inputs based on the given data set in the future. The given dataset was trained with machine learning algorithms, so that the new observation can be predicted based on the previous knowledge of the trained data.

The whole dataset needed to be changed into nominal data for analytical purposes. Then the data was pre-processed to have the mean value of zero and standard deviation of one column wise. After that dimension reduction was done by removing some less informative features. This can be done by feature selection and feature extraction algorithms. Then the data was trained with machine leaning algorithms. There were different kinds of algorithms to choose based on the data, and the best algorithm gave the least classification error and that was selected.
Machine learning algorithms have been used in different areas of applications. The main application areas are analysis on large databases and on domains where human might not well establish hypothesis. In this project, classification for the party affiliation was analysed for 16 dimensional dataset for 435 observations using KNN and Naïve Bayes algorithm. The result was evaluated with the testing set and 95 % accuracy level was achieved.

Kokoelmat

Opinnäytetyöt