Big Data Processing Cluster for Time Series Modelling

Verbovskiy, Andrey

Big Data Processing Cluster for Time Series Modelling

Verbovskiy, Andrey (2023)

Avaa tiedosto

Verbovskiy_Andrey.pdf (2.588Mt)

Lataukset:

Verbovskiy, Andrey

2023

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023060421119

Tiivistelmä

The goal of this project was to build a Big Data Processing cluster, consisting of three main components: Apache Spark for data processing, a Big Data distributed database called Apache Cassandra and a distributed event streaming platform called Apache Kafka.

The main research question was how to build and maintain a Big Data processing cluster with open-source technologies on virtual machines. Moreover, the thesis studies and discusses the core technologies and their architecture.

The end product of this study was a cluster of three virtual machines, with all three technology instances installed on each of them. The system is stable and has a functional web interface. In conclusion, companies such as Nokia can use open-source technologies for big data processing instead of purchasing expensive subscriptions of similar technologies provided by other companies.

Kokoelmat

Opinnäytetyöt