Enhancing Multi-Dimensional Music Generation by an LLM-based Data Augmentation Technique

Kharlashkin, Lev

Enhancing Multi-Dimensional Music Generation by an LLM-based Data Augmentation Technique

Kharlashkin, Lev (2024)

Avaa tiedosto

Kharlashkin_Lev.pdf (950.4Kt)

Lataukset:

Kharlashkin, Lev

2024

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024112630260

Tiivistelmä

This thesis investigates the advancement of AI-driven music generation through the fine-tuning of the MusicGen model, which combines textual descriptions with musical elements like mood, genre and instrumentation to generate stylistically accurate compositions. The study aims to enhance the model's ability to capture multi-dimensional musical attributes, contributing to the field of computational creativity.

Techniques such as Low-Rank Adaptation (LoRA), mixed-precision training and custom tokenization were employed to optimise computational efficiency and incorporate musical tags into the model’s vocabulary. Fine-tuning was conducted on a dataset of approximately 12,760 tracks, with prompts and tags used to guide the model in generating compositions closely aligned with textual inputs.

Evaluation includes tag-based analysis and human assessments using a simplified response structure of "Yes," "No," and "I don't know." Results showed notable success in genres like electronic and ambient music but identified challenges with complex multi-tag prompts, such as those involving experimental or chillout genres. Human evaluations also highlighted unintended audio artefacts, such as white noise, present across all tracks, along with issues of data imbalance.

These findings contribute to the evolving discourse on AI's potential in creative fields, supporting its role as a tool to aid music composition, with promising applications in music education, production and content creation.

Kokoelmat

Opinnäytetyöt