Applying Statistical Methods to Predicting Real Estate Value
Nguyen, Son (2019)
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
This thesis was conducted as a data mining project that was inspired by the recent advances in the fields of statistics and computer science as well as their applications in the business field. The main objective of the thesis was to predict future house price using a publicly available data set containing observed information about real estate value in Sindian District, Taiwan. First, the data was examined for useful information by computing basic descriptive statistics as well as plotting graphs for visualization of the distribution and relationships of the variables. Subsequently, the data was divided into a training set and a test set, then linear regression and random forests models were built and tested. These models used statistics to identify the pattern as well as the relationships between the predictors and the response in the training data which would then be used to predict future values of the response on the basis of the predictors. The linear model selection was done by the best subset method while the random forests models were compared using test MSE and the model with the lowest test MSE was chosen. The results showed that random forests models had significantly lower MSE and thus proved to be more suitable for the predicting purpose.