Comparative Analysis of Big Data Processing in AWS and GCP Cloud Environments
Noman, Muhammad (2024)
Noman, Muhammad
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024061223045
https://urn.fi/URN:NBN:fi:amk-2024061223045
Tiivistelmä
The purpose of this thesis is to compare Amazon Web Services (AWS) and Google Cloud Platform (GCP) to identify the best option for processing large amounts of structured data in CSV files. The aim is to help organizations to make an informed decision between the two studied platforms.
The study involved running data processing jobs on AWS Elastic MapReduce (EMR) and GCP Dataproc using PySpark to measure performance in terms of speed, throughput, and the number of records processed.
The findings showed that AWS EMR is faster than GCP Dataproc. However, the cost and performance depend on the configuration of computational resources, with more power leading to faster processing but higher costs. Meanwhile, GCP Dataproc was found to be more cost-efficient and easier to use. This makes GCP Dataproc a good choice for organizations that prioritize budget and simplicity over performance.
The choice between AWS and GCP for big data processing depends on an organization’s specific needs and priorities. AWS EMR is recommended for scenarios where speed and performance are crucial, while GCP Dataproc is better for those looking for a cost-effective and easy-to-use solution. Both platforms offer scalability, enabling organizations to adjust their computational resources based on their workload demands.
The study involved running data processing jobs on AWS Elastic MapReduce (EMR) and GCP Dataproc using PySpark to measure performance in terms of speed, throughput, and the number of records processed.
The findings showed that AWS EMR is faster than GCP Dataproc. However, the cost and performance depend on the configuration of computational resources, with more power leading to faster processing but higher costs. Meanwhile, GCP Dataproc was found to be more cost-efficient and easier to use. This makes GCP Dataproc a good choice for organizations that prioritize budget and simplicity over performance.
The choice between AWS and GCP for big data processing depends on an organization’s specific needs and priorities. AWS EMR is recommended for scenarios where speed and performance are crucial, while GCP Dataproc is better for those looking for a cost-effective and easy-to-use solution. Both platforms offer scalability, enabling organizations to adjust their computational resources based on their workload demands.