BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Stable Diffusion in Java (SD4J) Enables Generating Images with Deep Learning

Stable Diffusion in Java (SD4J) Enables Generating Images with Deep Learning

Oracle Open Source has introduced the Stable Diffusion in Java (SD4J) project, a modified port of the Stable Diffusion C# implementation with support for negative text inputs. Stable diffusion is a deep learning text-to-image model based on diffusion. SD4J can be used, via the GUI or programmatically in Java applications, to generate images. SD4J runs on top of the ONNX Runtime, a cross platform inference and training machine learning accelerator, allowing faster customer experience and reduced model training time.

Git Large File Storage, a Git extension for versioning large files, should be installed first, for example with the following command on Linux:

$ sudo apt-get install git-lfs

Afterwards, the SD4J project can be cloned locally with the following command:

$ git clone https://github.com/oracle-samples/sd4j

SD4J uses models, the compatible pre-built ONNX models from Hugging Face, that will be used for the examples in this news story:

$ git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -b onnx

The README contains more information on using other models, such as those not in ONNX format.

ONNXRuntime-Extensions is a library which extends the capabilities of the ONNX models and the interference with the ONNX Runtime:

$ git clone https://github.com/microsoft/onnxruntime-extensions

After cloning the project, the following command can be executed inside the onnxruntime-extensions directory to compile the ONNXRuntime-Extensions for your platform:

$ ./build_lib.sh --config Release --update --build --parallel

The following error might be displayed if CMake isn't installed:

[ERROR] - Unable to find CMake executable. Please specify its path with --cmake_path.

Install at least version 3.25 of CMake to resolve the error, for example with the following command on Linux:

$ sudo apt-get install cmake

When the build is successful, the resulting library (libortextensions.[dylib,so] or ortextensions.dll) can be found inside the following directory:

build/<OS-name>/Release/lib/

The resulting library should be copied to the root directory of the SD4J project.

After these preparations, the GUI can be started by executing the Maven command, containing the model path, inside the sd4j directory:

$ mvn package exec:exec -DmodelPath=../stable-diffusion-v1-5/

The SD4J GUI is shown after the Maven command executed successfully:

The images in this news story are created with guidance scale 10, seed 42, inference steps 50 and image scheduler Euler Ancestral, unless stated otherwise.

First, the GUI is used to create an image of a sports car on the road, with the following image text:

Professional photograph sports car on the road, high resolution, high quality

This results in a red sports car on a road:

When generating images of sports cars, most of them are red. In order to create images with sports cars that aren't red, the image negative text may be used to specify what the image shouldn't contain. For example, by using the value red for image negative text, a white car is generated in this example:

The guidance scale indicates whether the resulting image should be closely related to the text prompt. A higher number indicates that they should be closely related. Conversely, a lower number may be used if more creativity in the image is desired. For stable diffusion, most models use a default guidance scale value between 7 and 7.5.

A clear picture of a house on a hill surrounded by trees is generated using the image text: Professional photograph of house on a hill, surrounded by trees, while it rains, high resolution, high quality and guidance scale 10:

Using the same image text with guidance scale 1 allows more creativity and the house is now a bit hidden between the trees and the hill is less visible:

The seed is a random number used to generate noise. The generated images stay the same when using the same seed, prompt and other parameters.

Stable diffusion starts with an image of random noise. With each inference step, the noise is reduced and steered towards the prompt. Higher is not always better as it might introduce unwanted details. The Hugging Face website in general recommends 50 inference steps.

Creating an image of a tree in a park with inference 10 results in a relatively noisy tree image:

Increasing the inference steps to 50 results in a clearer image of a tree:

While increasing the inference steps further to 200 results in an image clearly displaying multiple trees and some other elements, for example in red:

The image scheduler takes a model's output to return a denoised version, while the batch size specifies the amount of generated images.

Working manually via the GUI allows generating images, however the project also provides the SD4J Java class to access SD4J programmatically.

Faster image generation is possible after enabling the CUDA integration for NVIDIA GPUs by changing the exec-maven-plugin in the pom.xml from <argument>CPU</argument> to <argument>CUDA</argument>.

More information can be found in the SD4J README and the Hugging Face documentation provides additional information about the different concepts.

About the Author

Rate this Article

Adoption
Style

BT