Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News MediaPipe Now Supports On-Device Text-to-Image Generation on Android

MediaPipe Now Supports On-Device Text-to-Image Generation on Android

Announced a few months ago, MediaPipe diffusion plugin is now available as an experimental tool on Android devices. Named Image Generator, the plugin can generate images entirely on-device in approximately 15 seconds on high end devices, says Google.

The new MediaPipe Image Generator can be used to generate images based on textual prompts using standard diffusion models. The Image Generation API supports any models conforming to the Stable Diffusion v1.5 architecture.

In addition to using a pre-trained model, you can fine tune your models and convert them to a supported model format using a conversion script provided by Google. This makes it possible to inject conditioning images into your models to better control the generation process and the final generated image. Additionally, you can use Low-Rank Adaptation (LoRA) weights to create images of specific, pre-defined concepts.

To use a diffusion model directly by feeding a textual prompt into the Image Generator API, the first step is creating an options object with the path to your foundation model files on the device, then pass it to the ImageGenerator constructor. Once you have the ImageGenerator instance, you pass it the prompt, the number of iterations, and a seed value, and get back the generated image:

val options = ImageGeneratorOptions.builder().setImageGeneratorModelDirectory(MODEL_PATH).build() imageGenerator = ImageGenerator.createFromOptions(context, options)

val result = imageGenerator.generate(prompt_string, iterations, seed) val bitmap = BitmapExtractor.extract(result?.generatedImage())

Alternatively, you can use a new plugin system that Google developed to make the process of passing a conditioning image easier.

We currently support three different ways that you can provide a foundation for your generations: facial structures, edge detection, and depth awareness. The plugins give you the ability to provide an image, extract specific structures from it, and then create new images using those structures.

Google has provided a number of plugins that should be used in combination with a foundational model, each tailored to a specific end effect. In particular, the Canny Edge plugin uses the edges implied by the condition image, and generates a new image based on the text prompt; the Face Landmark plugin provides a detailed face mesh of a single face and generates a new face over the mesh; finally, the Depth plugin uses the condition image to infer the size and depth of the object to generate. Each plugin supports a number of options to customize their behavior.

The Image Generator can also be customized using LoRA to extend a foundation model by teaching it a new concept.

With the new LoRA weights, the Image Generator becomes a specialized generator that is able to inject specific concepts into generated images.

For example, you can create LoRA weights using several images of a given subject, then use those weights to generate a new image of the same subject in a different environment.

If you are interested in trying the new MediaPipe Image Generator, you can start from the official sample on GitHub, which demonstrates the three ways you can use it.

About the Author

Rate this Article