Neural Synthesis with RAVE : A Practical Guide on how to Integrate Neural Synthesis inside your Music Production
Castellanos Sosa, Vicente Allan (2024)
Castellanos Sosa, Vicente Allan
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024110527357
https://urn.fi/URN:NBN:fi:amk-2024110527357
Tiivistelmä
This thesis investigates the implementation and application of the Realtime Audio Variational Autoencoder (RAVE) model within music production and sound design. It focuses on RAVE's potential for high-quality, real-time audio synthesis and manipulation, addressing computational efficiency and audio fidelity challenges.
Using a mixed-methods approach, the research combines technical RAVE model implementation with qualitative analysis of generated audio. The model was trained on an Arabic speech and environmental sound dataset, using a custom hardware setup with an NVIDIA 4070 Ti GPU. The training process covers data preparation, model configuration, and performance monitoring.
The findings highlight RAVE’s ability to reconstruct and generate high-quality audio across various musical styles, excelling in preserving timbral accuracy and handling complex textures such as vocals. The model achieved low-latency performance, demonstrating its suitability for real-time applications, and opening possibilities for live performances and interactive installations.
This research contributes to AI-driven audio synthesis by offering a detailed guide for training RAVE models in music production. It also explores potential applications in sound design, real-time audio effects processing, and multi-modal integration with visual or textual data. Additionally, the study addresses RAVE's limitations and challenges, particularly in intuitive latent space manipulation, and identifies areas for future research to enhance human creativity in audio arts further.
Using a mixed-methods approach, the research combines technical RAVE model implementation with qualitative analysis of generated audio. The model was trained on an Arabic speech and environmental sound dataset, using a custom hardware setup with an NVIDIA 4070 Ti GPU. The training process covers data preparation, model configuration, and performance monitoring.
The findings highlight RAVE’s ability to reconstruct and generate high-quality audio across various musical styles, excelling in preserving timbral accuracy and handling complex textures such as vocals. The model achieved low-latency performance, demonstrating its suitability for real-time applications, and opening possibilities for live performances and interactive installations.
This research contributes to AI-driven audio synthesis by offering a detailed guide for training RAVE models in music production. It also explores potential applications in sound design, real-time audio effects processing, and multi-modal integration with visual or textual data. Additionally, the study addresses RAVE's limitations and challenges, particularly in intuitive latent space manipulation, and identifies areas for future research to enhance human creativity in audio arts further.