Cost-Effective Query Optimization in Time Series Databases for an Analytics Application
Kaitemo, Anette (2025)
Kaitemo, Anette
2025
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025121737458
https://urn.fi/URN:NBN:fi:amk-2025121737458
Tiivistelmä
This thesis addresses the need for optimizing cost and performance in a cloud-based analytics application called 10Duke Insights. By studying the structure of time series database queries within the 10Duke Insights application problems arose when the database’s billing model changed. This led to escalating costs due to multiple unoptimized queries executing directly against raw event data. The project was guided by research questions concerning the impact of query structure and indexing on cost, the need for replacing raw data with pre-aggregated results and the technical implementation of these solutions. The project was completed using an iterative methodology within a test environment, analysing initial queries in AWS Timestream and implementing the final architecture in InfluxDB following an unexpected architectural change. Key methods included SQL based filtering and the use of Python scripting and InfluxDB triggers to automate the creation of pre-aggregated tables. The results validated this two-phased approach and data aggregation provided the biggest improvement, reducing the query load significantly. Furthermore, filtering proved essential for long-term cost control, allowing the database to read only the necessary data. The conclusion is that shifting from real time raw queries to a combined strategy of scheduled aggregation and filtering is the most robust strategy for achieving long-term performance and cost-efficiency in modern time series analytics, making the application viable against increasing data volumes and architectural changes. This thesis is delimited to the technical factors affecting resource use and excludes a detailed financial analysis or user experience study.
