Stability AI and Arm Collaborate to Release Stable Audio Open Small, Enabling Real-World Deployment for On-Device Audio Generation

Key Takeaways:

  • We’re open-sourcing Stable Audio Open Small, a 341 million parameter text-to-audio model optimized to run entirely on Arm CPUs. Designed for quickly generating short audio samples, it can produce up to 11 seconds of audio on a smartphone in less than 8 seconds.

  • This release builds on our collaboration with Arm to bring generative audio creation to smartphones, following our recent announcement at Mobile World Congress.

  • Developers can explore the new Arm Learning Path, which offers hands-on guidance using Stable Audio Open Small on Arm CPUs.

  • Stable Audio Open Small is now free for commercial and non-commercial use under the permissive Stability AI Community License. You can read the paper on arXiv, download the model weights on Hugging Face, and access the code on GitHub.


Bringing generative audio creation to mobile phones

We’re open-sourcing Stable Audio Open Small in partnership with Arm, whose technology powers 99% of smartphones globally. Building on the industry-leading text-to-audio model Stable Audio Open, the new compact variant is smaller and faster, while preserving output quality and prompt adherence. 

This release follows our previously announced breakthrough that Stable Audio Open is now optimized to run on Arm CPUs, powered by Arm KleidiAI to enable AI-generated audio on a mobile phone. After demonstrating the technology in action at Mobile World Congress, Stability AI and Arm are now making the model weights available for anyone to access and deploy the model. 

Technical advancements

To our knowledge, Stable Audio Open Small is the fastest stereo text-to-audio model on the market. You can read more about the technical advancements of the model in the research paper. Here are a few highlights:

Lightweight: Stable Audio Open Small has 341M parameters, compared to Stable Audio Open’s 1.1B parameters.

Fast: Stable Audio Open Small is optimized to generate audio on a mobile phone in less than 8 seconds. It’s faster to generate, and faster to fine-tune.

Efficient: Leveraging Arm’s KleidiAI libraries, we designed this new model to run even more efficiently at the edge, so users get faster results back while lowering costs for compute time. By running entirely on Arm CPUs, Stable Audio Open Small is also accessible without heavy hardware requirements.

When to use the model

Like Stable Audio Open, Stable Audio Open Small is optimized for generating short audio samples, sound effects and production elements using text prompts. It is well suited for creating drum loops, foley, instrument riffs, and ambient textures. 

Its compact size and fast inference make it a perfect fit for on-device deployment on Arm-powered smartphones and edge devices, where real-time generation and responsiveness matter.

As AI-driven creative media workloads move to the edge, smaller models help align compute resources with task complexity. By using different model sizes, organizations can allocate workloads to the processors best suited to their use case, like generating short sound effects versus full-length songs.

Getting started

Stable Audio Open Small is now free for commercial and non-commercial use under the permissive Stability AI Community License. You can read the paper on arXiv, download the model weights on Hugging Face, and access the code on GitHub.

Visit the Arm Learning Path to walk through deploying Stable Audio Open Small on Arm hardware as well as the Arm Community Blog for a deep technical dive into how Stable Audio Open Small was optimized for on-device performance.

To stay updated on our progress, follow us on X, LinkedIn, Instagram, and join our Discord Community.

Next
Next

Stable Diffusion Now Optimized for AMD Radeon™ GPUs and Ryzen™ AI APUs