Enhancing Multi-Dimensional Music Generation by an LLM-based Data Augmentation Technique
Kharlashkin, Lev (2024)
Kharlashkin, Lev
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024112630260
https://urn.fi/URN:NBN:fi:amk-2024112630260
Tiivistelmä
This thesis investigates the advancement of AI-driven music generation through the fine-tuning of the MusicGen model, which combines textual descriptions with musical elements like mood, genre and instrumentation to generate stylistically accurate compositions. The study aims to enhance the model's ability to capture multi-dimensional musical attributes, contributing to the field of computational creativity.
Techniques such as Low-Rank Adaptation (LoRA), mixed-precision training and custom tokenization were employed to optimise computational efficiency and incorporate musical tags into the model’s vocabulary. Fine-tuning was conducted on a dataset of approximately 12,760 tracks, with prompts and tags used to guide the model in generating compositions closely aligned with textual inputs.
Evaluation includes tag-based analysis and human assessments using a simplified response structure of "Yes," "No," and "I don't know." Results showed notable success in genres like electronic and ambient music but identified challenges with complex multi-tag prompts, such as those involving experimental or chillout genres. Human evaluations also highlighted unintended audio artefacts, such as white noise, present across all tracks, along with issues of data imbalance.
These findings contribute to the evolving discourse on AI's potential in creative fields, supporting its role as a tool to aid music composition, with promising applications in music education, production and content creation.
Techniques such as Low-Rank Adaptation (LoRA), mixed-precision training and custom tokenization were employed to optimise computational efficiency and incorporate musical tags into the model’s vocabulary. Fine-tuning was conducted on a dataset of approximately 12,760 tracks, with prompts and tags used to guide the model in generating compositions closely aligned with textual inputs.
Evaluation includes tag-based analysis and human assessments using a simplified response structure of "Yes," "No," and "I don't know." Results showed notable success in genres like electronic and ambient music but identified challenges with complex multi-tag prompts, such as those involving experimental or chillout genres. Human evaluations also highlighted unintended audio artefacts, such as white noise, present across all tracks, along with issues of data imbalance.
These findings contribute to the evolving discourse on AI's potential in creative fields, supporting its role as a tool to aid music composition, with promising applications in music education, production and content creation.
