Stability AI Open-Sources Image Generation Model Stable Diffusion

Stability AI released the pre-trained model weights for Stable Diffusion, a text-to-image AI model, to the general public. Given a text prompt, Stable Diffusion can generate photorealistic 512x512 pixel images depicting the scene described in the prompt.

The public release of the model weights follows the earlier release of code and a limited release of the model weights to the research community. With the latest release, any user can download and run Stable Diffusion on consumer-level hardware. Beside text-to-image generation, the model also supports image-to-image style transfer as well as upscaling. Along with the release, Stable AI also released a beta version of an API and web UI for the model called DreamStudio. According to Stable AI:

Stable Diffusion is a text-to-image model that will empower billions of people to create stunning art within seconds. It is a breakthrough in speed and quality meaning that it can run on consumer GPUs...This will allow both researchers and...the public to run this under a range of conditions, democratizing image generation. We look forward to the open ecosystem that will emerge around this and further models to truly explore the boundaries of latent space.

Stable Diffusion is based on an image generation technique called latent diffusion models (LDMs). Unlike other popular image synthesis methods such as generative adversarial networks (GANs) and the auto-regressive technique used by DALL-E, LDMs generate images by iteratively "de-noising" data in a latent representation space, then decoding the representation into a full image. LDM was developed by the Machine Vision and Learning research group at the Ludwig Maximilian University of Munich and described in a paper presented at the recent IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Earlier this year, InfoQ covered Google's Imagen model, another diffusion-based image generation AI.

The Stable Diffusion model can support several operations. Like DALL-E, it can be given a text description of a desired image and generate a high-quality that matches that description. It can also generate a realistic-looking image from a simple sketch plus a textual description of the desired image. Meta AI recently released a model called Make-A-Scene that has similar image-to-image capabilities.

Many users of Stable Diffusion have publicly posted examples of generated images; Katherine Crowson, lead developer at Stability AI, has shared many images on Twitter. Some commenters are troubled by the impact that AI-based image synthesis will have on artists and the art world. The same week that Stable Diffusion was released, an AI-generated artwork won first prize in an art competition at the Colorado State Fair. Simon Williamson, a co-creator of the Django framework, noted that

I've seen an argument that AI art is ineligible for copyright protection since "it must owe its origin to a human being" - if prompt design wasn't already enough to count, [image-to-image] presumably shifts that balance even more.

Stable AI founder Emad Mostaque answered several questions about the model on Twitter. Replying to one user who tried to estimate the compute resources and cost needed to train the model, Mostaque said:

We actually used 256 A100s for this per the model card, 150k hours in total so at market price $600k

Mostaque also linked to a Reddit post giving tips on how best to use the model to generate images.

The code for Stable Diffusion is available on GitHub. The model weights as well as a Colab notebook and a demo web UI are available on HuggingFace.

About the Author

Anthony Alford

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Anthony Alford

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter