Introducing Stable Video 4D, Our Latest AI Model for Dynamic Multi-Angle Video Generation

Key Takeaways: 

  • Stable Video 4D transforms a single object video into multiple novel-view videos of eight different angles/views. 

  • Stable Video 4D, with a single inference, generates 5 frames across 8 views in about 40 seconds. 

  • Users can specify camera angles, tailoring the output to meet specific creative needs.

The model, currently in its research phase, has future applications in game development, video editing, and virtual reality, with ongoing improvements expected. It is currently available on Hugging Face. 

We are pleased to announce the availability of Stable Video 4D, an innovative model that allows users to upload a single video and receive dynamic novel-view videos of eight new angles/views, delivering a new level of versatility and creativity. 

Building on the robust foundation of our Stable Video Diffusion model, which converts images into videos, the Stable Video 4D model takes a video as input and generates multiple novel-view videos from different perspectives. This advancement represents a leap in our capabilities, moving from image-based video generation to full 3D dynamic video synthesis.

How It Works

Users start by uploading a single video and specifying their desired 3D camera poses. Stable Video 4D then generates eight novel-view videos following the specified camera views, providing a comprehensive, multi-angle perspective of the subject. The generated videos can then be used to efficiently optimize a dynamic 3D representation of the subject in the video.

Currently, Stable Video 4D can generate 5-frame videos across the 8 views in about 40 seconds, with the entire 4D optimization taking approximately 20 to 25 minutes. Our team envisions future applications in game development, video editing, and virtual reality. Professionals in these fields can significantly benefit from the ability to visualize objects from multiple perspectives, enhancing the realism and immersion of their products.

State-of-the-art Performance

Unlike previous approaches that often require sampling from a combination of image diffusion model, video diffusion model, and multi-view diffusion model, SV4D is able to generate multiple novel-view videos at the same time, which greatly improves the consistency in the spatial and temporal axes. This capability not only ensures consistent object appearance across multiple views and timestamps, but enables a more lightweight 4D optimization framework without the cumbersome score distillation sampling (SDS) with multiple diffusion models.

Stable Video 4D is able to generate novel view videos that are more detailed, faithful to the input video, and are consistent across frames and views compared to existing works. 

Research and Development

Stable Video 4D is available on Hugging Face and is our first video-to-video generation model, marking an exciting milestone for Stability AI. We are actively working on refining the model, optimizing it to handle a wider range of real-world videos beyond the current synthetic datasets it has been trained on.

The Stability AI team is dedicated to continuous innovation and exploration of real world use-cases for this and other technologies. We anticipate that companies will adopt our model, fine-tuning it further to suit their unique requirements. The potential for this technology in creating realistic, multi-angle videos is vast, and we are excited to see how it will evolve with ongoing research and development.

Technical Report

In conjunction with this announcement, we are releasing a comprehensive technical report detailing the methodologies, challenges, and breakthroughs achieved during the development of this model.

Stable Video 4D represents state-of-the-art, open-source novel-view video generation technology. By transforming single video inputs into dynamic, multi-angle 3D outputs, we are opening new avenues for creativity and innovation in various industries. Stay tuned for further updates as we continue to enhance and expand the capabilities of this exciting technology.

By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we continue to improve the model. To stay updated on our progress, follow us on Twitter, Instagram, LinkedIn, and join our Discord Community.

Previous
Previous

Introducing Stable Fast 3D: Rapid 3D Asset Generation From Single Images

Next
Next

Stable Audio Open: Research Paper