Predicting traffic incidents using open data sources
Hakkari, Onni (2019)
Hakkari, Onni
Jyväskylän ammattikorkeakoulu
2019
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-201902182434
https://urn.fi/URN:NBN:fi:amk-201902182434
Tiivistelmä
The purpose of the research was to examine if openly available data sources contain features that affect the occurrence and severity of traffic incidents.
The implementation contains data engineering, feature engineering and building of predictive models. Time series data was used, and combining the time series data became a major part of the research, because new software had to be developed that can combine many time series datasets with different time frequencies. The software had to be implemented in memory efficient manner, since the datasets were too large for memory to handle every dataset at the same time.
As a result, no descriptive feature could be found affecting the amount of traffic incidents or the severity of traffic incidents. However, an interesting perception was made, where it was discovered that while the amount of traffic incidents remains approximately the same no matter what season of year it is; however, in summer, the traffic incidents tend to be severe. One hypothesis is that it could be due motorcycles, because one could draw the conclusion in which traffic incidents involving motorcycles are more severe than those involving only four-wheel vehicles.
Even though the data was not descriptive enough to proceed to building polished predictive neural network models, plenty of data analysis methods were applied to explore how descriptive the data is. Beneficial work has been done, since the developed data combining software can be open sourced in the future, which can offer benefits for other data scientists who need to combine complex time series data.
The implementation contains data engineering, feature engineering and building of predictive models. Time series data was used, and combining the time series data became a major part of the research, because new software had to be developed that can combine many time series datasets with different time frequencies. The software had to be implemented in memory efficient manner, since the datasets were too large for memory to handle every dataset at the same time.
As a result, no descriptive feature could be found affecting the amount of traffic incidents or the severity of traffic incidents. However, an interesting perception was made, where it was discovered that while the amount of traffic incidents remains approximately the same no matter what season of year it is; however, in summer, the traffic incidents tend to be severe. One hypothesis is that it could be due motorcycles, because one could draw the conclusion in which traffic incidents involving motorcycles are more severe than those involving only four-wheel vehicles.
Even though the data was not descriptive enough to proceed to building polished predictive neural network models, plenty of data analysis methods were applied to explore how descriptive the data is. Beneficial work has been done, since the developed data combining software can be open sourced in the future, which can offer benefits for other data scientists who need to combine complex time series data.