SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios.

read more

Previous
Previous

Humans in 4D: Reconstructing and Tracking Humans with Transformers

Next
Next

Objaverse-XL: A Universe of 10M+ 3D objects