Impact of quantization techniques on the performance and quality of AI models in AMD-based systems for local deployment of large language models

Muikku, Jere

Impact of quantization techniques on the performance and quality of AI models in AMD-based systems for local deployment of large language models

Muikku, Jere (2024)

Avaa tiedosto

Jere Muikku.pdf (1.418Mt)

Lataukset:

Muikku, Jere

2024

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024122638134

Tiivistelmä

This thesis investigates the impact of quantization techniques on AI text generation models, with a focus on optimizing performance and accuracy in locally deployed systems. As quantization becomes increasingly important for improving the efficiency of machine learning models, its role in balancing computational demands with output quality has drawn significant attention. By focusing on k-quant GGUF models, this study addresses the challenges and trade-offs associated with adapting high-performing models to resource-constrained environments.

The research involves selecting and evaluating text generation models compatible with the GGUF format and capable of functioning on Windows systems. It explores how different levels and methods of quantization influence model behaviour, including memory usage, processing efficiency, and response accuracy. These factors are critical for deploying AI models in scenarios where computational resources are limited, such as standard personal computers or other resource-limited systems.

This study systematically analyses quantization techniques, highlighting their implications for both model performance and practical usability. It contributes to the broader understanding of optimizing AI models for various deployment contexts, offering insights into how quantization enhances the efficient and effective application of machine learning technologies.

Kokoelmat

Opinnäytetyöt (Avoin kokoelma)