Stability AI and Arm Bring On-Device Generative Audio to Smartphones
Key Takeaways
We’ve partnered with Arm to bring generative audio to mobile devices, enabling high-quality sound effects and audio sample generation directly on-device with no internet connection required.
Leveraging Arm KleidiAI libraries and Stability AI’s cutting-edge technology, Stable Audio Open, can now run 30x faster on smartphone devices on Arm CPUs, reducing generation time from minutes to seconds.
This breakthrough will be showcased at MWC Barcelona on Monday, March 3rd, 2025, demonstrating unprecedented AI-powered content creation at the edge. You can learn about the partnership on the Built on Arm page here.
Today, we are making our cutting-edge generative AI models more accessible through our partnership with Arm, whose technology powers 99% of smartphones globally. Together, we have achieved what was once thought impossible by running Stable Audio Open, our industry-leading text-to-audio model, entirely on Arm CPUs without requiring an internet connection for the first time.
As generative AI becomes increasingly integral to both enterprises and professional creators alike, it's crucial that our models and workflows are easily accessible everywhere builders build and creators create, providing seamless integration into their visual media production pipelines.
With this rising demand, ensuring our models run efficiently at the edge is crucial. This collaboration enables generation of sound effects, audio samples, and production elements in seconds all on-device and offline.
At MWC Barcelona, we’ll showcase real-world applications of generative media at the edge, demonstrating how our on-device text-to-audio model enables rapid, high-quality audio generation.
Technical Advancements
Optimizing Stable Audio Open for mobile devices began as a significant challenge, with initial audio generation on an Arm CPU taking 240 seconds. By distilling the model and using Arm’s software stack, including the int8 matmul kernels from KleidiAI in ExecuTorch via XNNPack, Stability AI and Arm reduced the generation time for an 11-second clip to under 8 seconds on Armv9 CPUs, representing a 30x faster response time.
By running entirely on Arm CPUs, Stable Audio Open is now accessible without heavy hardware requirements, making it available to anyone with a compatible mobile device.
What’s Next
Audio is just the beginning. We aim to bring all of our cutting-edge models across image, video, and 3D to the edge. This partnership with Arm is a key step toward enabling high-quality media generation directly on mobile devices across all visual media modalities, transforming how visual media is created.
You can learn more about the partnership and view a demo on the Built on Arm webpage here and visit the Stability AI partner page here in the Arm partner catalog.
To stay updated on our progress follow us on X, LinkedIn, Instagram, and join our Discord Community.