Stability AI, the innovative startup renowned for the AI-powered art generator Stable Diffusion, has launched an open AI model designed to generate sounds and music, claiming it was exclusively trained on royalty-free recordings.
Dubbed Stable Audio Open, this generative model converts text descriptions (such as “Rock beat played in a treated studio, session drumming on an acoustic kit”) into recordings of up to 47 seconds. Training for the model involved approximately 486,000 samples sourced from free music libraries like FreeSound and the Free Music Archive.
Stability AI asserts that the model is versatile, capable of creating drum beats, instrumental riffs, ambient sounds, and “production elements” for various media including videos, films, and television shows. Additionally, it can be used to modify existing songs or transfer the style of one piece (e.g., smooth jazz) to another.
“One significant advantage of this open-source release is the ability for users to fine-tune the model using their own custom audio data,” Stability AI elaborated on its corporate blog. “For instance, a drummer could refine the model with their own drum recordings to generate new and unique beats.”
However, Stable Audio Open has some limitations. It is not designed to produce full songs, melodies, or vocals effectively. For those looking for such capabilities, Stability AI recommends its premium Stable Audio service.
Moreover, the use of Stable Audio Open for commercial purposes is prohibited by its terms of service. The model also exhibits performance disparities across different musical styles and cultures and struggles with descriptions in non-English languages—limitations that Stability AI attributes to the training data set.
“The training data may lack diversity, resulting in a model that doesn’t equally represent all cultures,” Stability AI acknowledges in the model’s description. “Consequently, the generated samples reflect these inherent biases.”
Stability AI, which has faced challenges in revitalizing its faltering business, recently made headlines when its VP of generative audio, Ed Newton-Rex, resigned over a dispute regarding the company’s stance on using copyrighted works to train generative AI models under the “fair use” doctrine. The release of Stable Audio Open appears to be a strategic move to counter this negative narrative, while concurrently promoting Stability AI’s premium offerings.