Designing and Implementing a Data Architecture for a Medium-Sized Business
Liljedahl, William (2025)
Liljedahl, William
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025060621096
https://urn.fi/URN:NBN:fi:amk-2025060621096
Tiivistelmä
This thesis explores the design and implementation of a data architecture for a medium-sized business. The work was conducted for Beamex Oy Ab, with the objective of building a data platform based on established architectural best practices.
The theoretical foundation examines key components of data architecture, including data sources, ingestion, storage, transformation, modeling and consumption. It reviews common architectural approaches, such as data lakes, data warehouses and lakehouses, and reflects on trade-offs between performance, flexibility and governance. It also introduces concepts in data management and governance, as well as organizational models, such as domain-oriented data ownership and self-service analytics.
The implementation centers on the Microsoft Fabric platform, using a layered lakehouse approach based on the Medallion Architecture and domain-oriented data ownership. Governance was integrated throughout the data lifecycle, aligning theory with real-world constraints. Ingestion and transformation are handled through a combination of data pipelines and PySpark notebooks, with emphasis on maintainability for a small team and speed to value.
The result is a practical architecture that balances centralized control with domain ownership. Reflections from the implementation highlight both the opportunities and challenges of applying modern data practices in a medium-sized business environment.
The theoretical foundation examines key components of data architecture, including data sources, ingestion, storage, transformation, modeling and consumption. It reviews common architectural approaches, such as data lakes, data warehouses and lakehouses, and reflects on trade-offs between performance, flexibility and governance. It also introduces concepts in data management and governance, as well as organizational models, such as domain-oriented data ownership and self-service analytics.
The implementation centers on the Microsoft Fabric platform, using a layered lakehouse approach based on the Medallion Architecture and domain-oriented data ownership. Governance was integrated throughout the data lifecycle, aligning theory with real-world constraints. Ingestion and transformation are handled through a combination of data pipelines and PySpark notebooks, with emphasis on maintainability for a small team and speed to value.
The result is a practical architecture that balances centralized control with domain ownership. Reflections from the implementation highlight both the opportunities and challenges of applying modern data practices in a medium-sized business environment.