SDXL AWS Documentation

Summary

This documentation will help AWS users get SDXL up and operate it effectively.

For developers, SDXL is easy to deploy and build around. This documentation will help developers incorporate SDXL into an application by setting up an API. Developer users with the goal of setting up SDXL for use by creators can use this documentation to deploy on AWS (Sagemaker or Bedrock).

For creators, SDXL is a powerful tool for generating and editing images. This documentation will help designers create a high-quality example image and alter it for quality. Creative users who aim to create and edit images using SDXL should familiarize themselves with text prompting and adjusting parameters for the optimal balance of speed and quality in image generation and image editing.

This documentation is under construction. Future documentation will include guidance for developers on fine-tuning SDXL with custom data. Future documentation will also guide creators in using enhancements at the point of inference (e.g., fine-tuned checkpoints or extractions, controls, etc.)

How to access SDXL on AWS

SDXL was developed on AWS and runs most effectively on the optimized AWS-managed services available on Bedrock and Sagemaker.

Bedrock

Users of SDXL via Bedrock can access all of the core SDXL capabilities for generating high-quality images.

About Amazon Bedrock

Amazon Bedrock is a fully managed service that makes foundation models from leading AI startups and Amazon available via an API. You can choose from various FMs to find the best model for your use case. With the Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and easily integrate and deploy them into your applications using the AWS tools and capabilities you are familiar with (including integrations with Amazon SageMaker ML features such as Experiments to test different models and Pipelines to manage your FMs at scale) without having to manage any infrastructure.

Deploying from your Bedrock

Once Bedrock is set up, you can use SDXL immediately using GenerationRequest. For example, a simple inference call will look like:

from stability_sdk.api import GenerationRequest
output = aws_bedrock_model.predict(GenerationRequest(
text_prompts=[TextPrompt(text="Sri lanka tea plantation.")] 

For more information on the different parameters that can be used, jump down to User Guidelines for Text to Image API.

Sagemaker

For developers who want to control the deployment of SDXL and integrate it into pipelines and applications, Sagemaker Jumpstart is the best way to get an app available to users quickly. Users of SDXL via Sagemaker Jumpstart can access all of the core SDXL capabilities for generating high-quality images.

SDXL is available on Sagemaker Studio via two Jumpstart options:

  • The SDXL 1.0 Jumpstart provides SDXL optimized for speed and quality, making it the best way to get started if your focus is on inferencing. An instance can be deployed for inferencing, allowing for API use for the image-to-text and image-to-image (including masked inpainting).

  • The SDXL 1.0 Open Jumpstart is the open SDXL model, ready to be used with custom inferencing code, fine-tuned with custom data, and implemented in any use case. This version does not contain any optimization and may require an instance with more GPU compute.

    • Note that the fine-tuning version may not be currently available to deploy

About Amazon SageMaker JumpStart

SageMaker JumpStart provides pre-trained, open models for a wide range of problem types to help AWS customers get started with machine learning. Customers can incrementally train and tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases and executable example notebooks for machine learning with SageMaker. You can access the pre-trained models, solution templates, and examples through the JumpStart landing page in Amazon SageMaker Studio.

Deploying from the SageMaker JumpStart page

The best way to start with Sagemaker is by following AWS Sagemaker documentation. - https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-use.html

These step-by-step instructions guide the user on how to use the Sagemaker Studio UI.

  1. Launch Amazon SageMaker Studio.

  2. Navigate to the "Models, notebooks, solutions" option within the SageMaker JumpStart section found in the navigation pane.

  3. Scroll down until you locate the "Foundation Models: Image Generation" section.

  4. In this section, you'll find two versions of SDXL 1.0 in the carousel:

    • SDXL 1.0: This is the official container from Stability, optimized for inference.

    • SDXL 1.0 Open: This is the pure open source model.

  5. Select the model version that suits your needs.

  6. Follow the instructions provided by the Sagemaker team to deploy your chosen model: https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-deploy.html

You can also run SDXL in a notebook. Choose Open Notebook in the Run in Notebook section to run an example notebook like the one below for the foundation model directly in Studio.

Developer Quickstart: Example Notebook

This example notebook can help the developer user start with a Sagemaker Jumpstart deployment of SDXL. For help in getting started with Bedrock, please refer to Bedrock documentation.

https://github.com/Stability-AI/aws-jumpstart-examples/blob/main/sdxl-v1-0/sdxl-1-0-jumpstart.ipynb

Creator Quickstart: Using the API for Text to Image

Once the API is available, input will be required in the form below.

{
  "cfg_scale": 7,
  "height": 1024,
  "width": 1024,
  "steps": 50,
  "seed": 42,
  "sampler": "K_DPMPP_2M",
  "text_prompts": [
    {
      "text": "A photograph of a cute puppy",
      "weight": 1
    }
  ]
} 

The above input will generate a photorealistic image of a cute puppy.

User guidelines for the Text to Image API

Below is a simple explanation of the different input parameters of the API.

Developers should be mindful of the needs of the end user creatives using the application built on the API. To provide end-user creatives with the optimal experience, it is recommended that some of these settings be hidden under an “Advanced” setting for the user.

cfg_scale: The guidance scale controls the potential for randomness in the image. Numbers that are too high will create a ‘fried’ effect in the image, while numbers that are too low will cause the image to lose coherence. It is recommended that a number between 5 and 15 be used. The default 7 is usually effective for most uses.

Height and Width: These parameters set the resolution of the image. SDXL 1.0 natively generates images best in 1024 x 1024. However, different aspect ratios may be used effectively.

The below settings for width and height are optimal for use on SDXL 1.0. Resolutions different from these may cause unintended cropping.

  - width: 1024
    height: 1024
  - width: 1152
    height: 896
  - width: 896
    height: 1152
  - width: 1216
    height: 832
  - width: 832
    height: 1216
  - width: 1344
    height: 768
  - width: 768
    height: 1344
  - width: 1536
    height: 640
  - width: 640
    height: 1536

It is recommended that developers pre-select the ideal resolutions for users based on the intended use case. The below lookup can be used for guidance:

Fullscreen: 4:3 - 1152x896
Widescreen: 16:9 - 1344x768
Ultrawide: 21:9 - 1536x640
Mobile landscape: 3:2 - 1216x832
Square: 1:1 - 1024x1024
Mobile Portrait: 2:3 - 832x1216
Tall: 9:16 - 768x1344

Steps: This parameter controls the number of steps of generation used to create the image. A higher number of steps generally leads to better quality but uses more resources. 50 is a good default, but high-quality images can be generated with lower numbers.

Seed: This parameter allows a user to create images more deterministically. The seed guides the image. Using the same seed as a previous image without changing other parameters will guide the image to reproduce the same image. It is recommended that the seed be set to “0” to randomize the seed every time. This advanced parameter can be leveraged to help users quickly generate desired outputs with pure text-to-image, i.e., without needing to use image-to-image.

Sampler: This parameter allows users to leverage different sampling methods that guide the denoising process in generating an image. As this is an advanced setting, it is recommended that the baseline sampler “K_DPMPP_2M” be used for most cases.

Text prompts: This parameter is the critical parameter used for text-to-image guidance. Unlike other generative image models, SDXL is optimized to generate complex compositions with high aesthetic quality, even with simple prompts. Some developers may want to leverage a prompt injection layer to provide users with curated styles for images.

For example, a user may want a photograph of a puppy so that they may type in “puppy” in the text prompts field. Prompt engineering for SDXL is significantly easier than in other image models, so there is no need for excess descriptors such as “high quality.”

Developers are encouraged to understand their end users' expected prompt engineering skills and verbal creativity to provide the best experience.