Näytä suppeat kuvailutiedot

Exploring Streaming Real-time pipeline with Cryptocurrency Price and News tracking system

Tran, Thuc (2024)

dc.contributor.authorTran, Thuc
dc.date.accessioned2024-11-24T19:44:23Z
dc.date.available2024-11-24T19:44:23Z
dc.date.issued2024-
dc.identifier.urihttp://www.theseus.fi/handle/10024/870198
dc.description.abstractThis thesis project applied modern data engineering practices to develop a modern data platform. The implemented system represented an end-to-end data pipeline capable of ingesting both realtime and batch data, processing and transforming raw data into insightful information through a dashboard showcasing live crypto price trend and daily news update. Instead of following the traditional thesis report outline, this project followed the Zipper model where it was divided into different topics. Each topic represented a major component of the project built on top of each other and was fully researched, implemented and analysed separately. First, the theoretical framework for key concepts of streaming data was designed and implemented into a local streaming data processing layer was created using Apache Kafka. Second, when the theory on cloud computing was established, the system was migrated to cloud to leverage serverless benefits from utilizing AWS services including Amazon MSK and EC2. Next, the thesis researched core concepts and design of a batch ingestion and processing pipeline. At the result, the system added another significant batch ingestion component using AWS EventBridge, Lambda and S3. These raw data from streaming and batch ingestion were stored in a cloud-based data warehousing solution, Snowflake, utilizing Snowpipe Streaming and Snowpipe. Lastly, after the discussion on theoretical aspect was made, a Python open-source library called Streamlit was chosen as the visualization component. This component transformed and display data into insightful information such as real-time price charts, trading volumes, news updates, and basic portfolio tracking capabilities. The analysis on implementation outcomes were made after each topic mentioned above. This approach ensured early detection on potential risks and/or optimization opportunities. For instance, the implementation for Topic 1 analysed and led to the need of cloud migration which was the core content of Topic 2. In addition, an overall analysis on the whole system performance resulted in the optimization approaches to reduce the cost of running the system on both AWS and Snowflake. The project showcases the practical implementation of modern data engineering concepts in context of financial data processing. The project encountered several challenges including data latency, resource utilization and balancing between operational performance and cost considerations. The resulting deliverables built a foundation for further development in market analysis and research, machine learning and investment decision support.-
dc.language.isoeng-
dc.rightsfi=All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|sv=All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|en=All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|-
dc.titleExploring Streaming Real-time pipeline with Cryptocurrency Price and News tracking system-
dc.type.ontasotfi=AMK-opinnäytetyö|sv=YH-examensarbete|en=Bachelor's thesis|-
dc.identifier.urnURN:NBN:fi:amk-2024112429701-
dc.subject.degreeprogramfi=Tietojenkäsittely|sv=Informationsbehandling|en=Business Information Technology|-
dc.subject.ysocloud services-
dc.subject.ysodata processing-
dc.subject.ysodata systems-
dc.subject.ysostreaming-
dc.subject.ysoinformation management-
dc.subject.ysowarehousing-
dc.subject.ysodistributed systems-
dc.subject.ysosnowflakes-
dc.subject.ysoinformation technology-
dc.subject.ysoreal-time-
dc.subject.disciplineDegree Programme in Business Information Technology-
annif.suggestions.linkshttp://www.yso.fi/onto/yso/p24167|http://www.yso.fi/onto/yso/p2407|http://www.yso.fi/onto/yso/p3927|http://www.yso.fi/onto/yso/p25409|http://www.yso.fi/onto/yso/p5521|http://www.yso.fi/onto/yso/p6576|http://www.yso.fi/onto/yso/p21082|http://www.yso.fi/onto/yso/p25560|http://www.yso.fi/onto/yso/p5462|http://www.yso.fi/onto/yso/p25256en


Tiedostot

Thumbnail

Viite kuuluu kokoelmiin:

Näytä suppeat kuvailutiedot