The new GenAI APIs recently added to ML Kit enable developers to use Gemini Nano for on-device inference in Android apps, supporting features like summarization, proofreading, rewriting, and image description.
For example, you can summarize documents of up to 3,000 English words, refine messages to be more formal or casual, and generate titles, metadata, or alternative image descriptions.
Running on-device means that all data, including input, inference, and output, never leaves the local device and does not incur any Cloud cost. According to Google, the GenAI APIs are designed to be easy to integrate and use, offering high-level abstractions similar to other ML Kit APIs.
This means you can expect quality results out of the box without extra effort for prompt engineering or fine tuning for specific use cases.
This is achieved by building each specialized API as a stack of components, starting with Gemini Nano as a common foundation. On top of Nano sits a small, API-specific LoRA adapter model to improve performance, followed by a layer that defines optimized inference parameters, such as prompt, temperature, top-K, and batch size. Finally, an evaluation pipeline incorporates automated raters, statistical metrics, and human raters to further improve generated responses.
The performance gains achieved through this approach are measured using benchmark scores tailored to each API by considering specific attributes, such as factual consistency in text summarization. These benchmarks demonstrate consistent improvements across all APIs, as illustrated in the image below.

The ML Kit GenAI APIs support both streaming and non-streaming workflows. Streaming is particularly suitable for longer responses, as it allows the output to be displayed incrementally without waiting for the entire response to be generated.
When you use the GenAI APIs, ML Kit will automatically download Gemini Nano and any required API-specific models, if necessary. Developers can control this process and choose to download the models in advance, if preferred. Assuming all the required models have already been downloaded, the following snippet shows a simplified view of the workflow for the Summarization API:
val articleToSummarize = ...
val summarizerOptions = SummarizerOptions.builder(context)
.setInputType(InputType.ARTICLE)
.setOutputType(OutputType.ONE_BULLET)
.setLanguage(Language.ENGLISH)
.build()
val summarizer = Summarization.getClient(summarizerOptions)
val summarizationRequest = SummarizationRequest.builder(text).build()
summarizer.runInference(summarizationRequest) { newText ->
// Show new text in UI
}
To ensure all required features are available locally, developers can call the summarizer.checkFeatureStatus method as appropriate.
ML Kit GenAI APIs are available on Android devices powered by optimized MediaTek Dimensity, Qualcomm Snapdragon, and Google Tensor platforms via AICore. Supported devices include the Pixel 9 series, Samsung Galaxy 25, Xiaomi 15, Motorola Razr 60 Ultra, and others.
For developers interested in getting started, a great place to begin is the official ML Kit GenAI APIs demo app, which showcases all the new features, along with the official documentation for deeper guidance.