Hyppää sisältöön
    • Suomeksi
    • På svenska
    • In English
  • Suomi
  • Svenska
  • English
  • Kirjaudu
Hakuohjeet
JavaScript is disabled for your browser. Some features of this site may not work without it.
Näytä viite 
  •   Ammattikorkeakoulut
  • Haaga-Helia ammattikorkeakoulu
  • Opinnäytetyöt (Avoin kokoelma)
  • Näytä viite
  •   Ammattikorkeakoulut
  • Haaga-Helia ammattikorkeakoulu
  • Opinnäytetyöt (Avoin kokoelma)
  • Näytä viite

Exploring Streaming Real-time pipeline with Cryptocurrency Price and News tracking system

Tran, Thuc (2024)

 
Avaa tiedosto
Tran_Thuc.pdf (6.780Mt)
Lataukset: 


Tran, Thuc
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024112429701
Tiivistelmä
This thesis project applied modern data engineering practices to develop a modern data platform. The implemented system represented an end-to-end data pipeline capable of ingesting both realtime and batch data, processing and transforming raw data into insightful information through a dashboard showcasing live crypto price trend and daily news update.

Instead of following the traditional thesis report outline, this project followed the Zipper model where it was divided into different topics. Each topic represented a major component of the project built on top of each other and was fully researched, implemented and analysed separately. First, the theoretical framework for key concepts of streaming data was designed and implemented into a local streaming data processing layer was created using Apache Kafka. Second, when the theory on cloud computing was established, the system was migrated to cloud to leverage serverless benefits from utilizing AWS services including Amazon MSK and EC2.

Next, the thesis researched core concepts and design of a batch ingestion and processing pipeline. At the result, the system added another significant batch ingestion component using AWS EventBridge, Lambda and S3. These raw data from streaming and batch ingestion were stored in a cloud-based data warehousing solution, Snowflake, utilizing Snowpipe Streaming and Snowpipe. Lastly, after the discussion on theoretical aspect was made, a Python open-source library called Streamlit was chosen as the visualization component. This component transformed and display data into insightful information such as real-time price charts, trading volumes, news updates, and basic portfolio tracking capabilities.

The analysis on implementation outcomes were made after each topic mentioned above. This approach ensured early detection on potential risks and/or optimization opportunities. For instance, the implementation for Topic 1 analysed and led to the need of cloud migration which was the core content of Topic 2. In addition, an overall analysis on the whole system performance resulted in the optimization approaches to reduce the cost of running the system on both AWS and Snowflake.

The project showcases the practical implementation of modern data engineering concepts in context of financial data processing. The project encountered several challenges including data latency, resource utilization and balancing between operational performance and cost considerations. The resulting deliverables built a foundation for further development in market analysis and research, machine learning and investment decision support.
Kokoelmat
  • Opinnäytetyöt (Avoin kokoelma)
Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste
 

Selaa kokoelmaa

NimekkeetTekijätJulkaisuajatKoulutusalatAsiasanatUusimmatKokoelmat

Henkilökunnalle

Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste