Impact of quantization techniques on the performance and quality of AI models in AMD-based systems for local deployment of large language models
Muikku, Jere (2024)
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024122638134
https://urn.fi/URN:NBN:fi:amk-2024122638134
Tiivistelmä
This thesis investigates the impact of quantization techniques on AI text generation models, with a focus on optimizing performance and accuracy in locally deployed systems. As quantization becomes increasingly important for improving the efficiency of machine learning models, its role in balancing computational demands with output quality has drawn significant attention. By focusing on k-quant GGUF models, this study addresses the challenges and trade-offs associated with adapting high-performing models to resource-constrained environments.
The research involves selecting and evaluating text generation models compatible with the GGUF format and capable of functioning on Windows systems. It explores how different levels and methods of quantization influence model behaviour, including memory usage, processing efficiency, and response accuracy. These factors are critical for deploying AI models in scenarios where computational resources are limited, such as standard personal computers or other resource-limited systems.
This study systematically analyses quantization techniques, highlighting their implications for both model performance and practical usability. It contributes to the broader understanding of optimizing AI models for various deployment contexts, offering insights into how quantization enhances the efficient and effective application of machine learning technologies.
The research involves selecting and evaluating text generation models compatible with the GGUF format and capable of functioning on Windows systems. It explores how different levels and methods of quantization influence model behaviour, including memory usage, processing efficiency, and response accuracy. These factors are critical for deploying AI models in scenarios where computational resources are limited, such as standard personal computers or other resource-limited systems.
This study systematically analyses quantization techniques, highlighting their implications for both model performance and practical usability. It contributes to the broader understanding of optimizing AI models for various deployment contexts, offering insights into how quantization enhances the efficient and effective application of machine learning technologies.