Big Data Processing Cluster for Time Series Modelling
Verbovskiy, Andrey (2023)
Verbovskiy, Andrey
2023
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023060421119
https://urn.fi/URN:NBN:fi:amk-2023060421119
Tiivistelmä
The goal of this project was to build a Big Data Processing cluster, consisting of three main components: Apache Spark for data processing, a Big Data distributed database called Apache Cassandra and a distributed event streaming platform called Apache Kafka.
The main research question was how to build and maintain a Big Data processing cluster with open-source technologies on virtual machines. Moreover, the thesis studies and discusses the core technologies and their architecture.
The end product of this study was a cluster of three virtual machines, with all three technology instances installed on each of them. The system is stable and has a functional web interface. In conclusion, companies such as Nokia can use open-source technologies for big data processing instead of purchasing expensive subscriptions of similar technologies provided by other companies.
The main research question was how to build and maintain a Big Data processing cluster with open-source technologies on virtual machines. Moreover, the thesis studies and discusses the core technologies and their architecture.
The end product of this study was a cluster of three virtual machines, with all three technology instances installed on each of them. The system is stable and has a functional web interface. In conclusion, companies such as Nokia can use open-source technologies for big data processing instead of purchasing expensive subscriptions of similar technologies provided by other companies.
