Exploring the Latest Advancements in AI Research
Our community of open source research hubs has over 200,000 members building the future of AI. We are working globally with our partners, industry leaders, and experts to develop cutting-edge open AI models for Image, Language, Audio, Video, 3D, Biology and more.
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Explore the latest research in image generation with the Hourglass Diffusion Transformer (HDiT). This paper presents a new approach in high-resolution image synthesis, setting itself apart by handling large-scale images more efficiently than traditional methods. It's an insightful read for those interested in the technical advancements of image generation, offering a deep dive into the complexities and innovations in this field.
Adversarial Diffusion Distillation
We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1–4 steps while maintaining high image quality.
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
We present Stable Video Diffusion — a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion
Stable Audio represents the cutting-edge audio generation research by Stability AI’s generative audio research lab, Harmonai.
Humans in 4D: Reconstructing and Tracking Humans with Transformers
Stability AI is proud to support research teams across the globe by providing compute power.
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone.
Objaverse-XL: A Universe of 10M+ 3D objects
Natural language processing and 2D vision models have attained remarkable proficiency on many tasks primarily by escalating the scale of training data.
Reconstructing the Mind’s Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
A cutting-edge method for reconstructing and retrieving images from fMRI brain activity.
OpenFlamingo v2: New Models and Enhanced Training Setup
We are excited to release five trained OpenFlamingo models across the 3B, 4B, and 9B scales.
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Discover the groundbreaking project led by top researchers from Tel Aviv University and the Technion Institute of Technology.