Lakehouse architecture in public cloud
Bulut, Emilia (2024)
Bulut, Emilia
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024060722039
https://urn.fi/URN:NBN:fi:amk-2024060722039
Tiivistelmä
This thesis explores the implementation and evaluation of data engineering services in Microsoft Azure, focusing on the innovative lakehouse architecture. Lakehouse architecture combines the strengths of data warehouses and data lakes, providing a unified platform for structured and unstructured data.
The primary purpose of this thesis was to evaluate various Azure services, such as Storage Accounts, Key Vault, Entra ID, and Azure Databricks. In addition, this thesis aimed to demonstrate their roles in building a robust lakehouse solution. The practical implementation used a dataset of unemployment benefits provided by Kela, the Social Insurance Institution of Finland, to showcase the data transformation and analytics capabilities of the lakehouse architecture.
The results indicated significant improvements in data processing efficiency and analytical capabilities, confirming the potential of the lakehouse architecture for complex data environments. Recommendations for future development include further optimisation of the system and exploration of additional Azure services to enhance the lakehouse’s capabilities.
The primary purpose of this thesis was to evaluate various Azure services, such as Storage Accounts, Key Vault, Entra ID, and Azure Databricks. In addition, this thesis aimed to demonstrate their roles in building a robust lakehouse solution. The practical implementation used a dataset of unemployment benefits provided by Kela, the Social Insurance Institution of Finland, to showcase the data transformation and analytics capabilities of the lakehouse architecture.
The results indicated significant improvements in data processing efficiency and analytical capabilities, confirming the potential of the lakehouse architecture for complex data environments. Recommendations for future development include further optimisation of the system and exploration of additional Azure services to enhance the lakehouse’s capabilities.