Google Launches Gemini 1.5 Flash for Lower-Latency and More Efficient AI Serving

Part of the Gemini family of AI models, Gemini Flash is a lighter-weight iteration that is designed to be faster and more efficient to use than Gemini Pro while offering the same "breakthrough" context window of one million tokens.

Gemini 1.5 Flash is optimized for high-volume, high-frequency tasks at scale and is less expensive to serve, Google says, but it keeps the ability for multimodal reasoning, including text, audio, and video generation.

1.5 Flash excels at summarization, chat applications, image and video captioning, data extraction from long documents and tables, and more.

According to Google, Gemini Flash requires on average less than one second for users to start seeing the model's output after entering their query (sub-second first-token latency) for the majority of use cases.

Gemini Flash has been "distilled" from Gemini Pro, which means it retains the latter's model most essential knowledge and skills in a more compact package. This implies Gemini 1.5 Flash inherits all the improvements that went into Gemini 1.5 Pro, including its efficient Mixture-of-Experts (MoE)-based architecture, larger context window, and enhanced performance.

In its announcement, Google highlights the fact that Gemini 1.5 models are able to support a context window size of up to two million tokens, thus outperforming current competitors, with Gemini 1.5 Flash offering a one million token window by default. This is enough to process a significant amount of information in one go, up to one hour of video, 11 hours of audio, codebases with over 30k lines of code, or over 700,000 words, according to Google.

Other improvements that went into Gemini 1.5 Pro and thus also benefit Gemini 1.5 Flash are code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding. Thanks to these advances, Gemini 1.5 Pro outperforms Gemini 1.0 Ultra on most benchmarks, Google says, while Gemini 1.5 Flash outperforms Gemini 1.0 Pro.

On a related note, Google has also updated its Gemini Nano model for device-based inference, which reached version 1.0. In its latest iteration, Gemini Nano has expanded its ability to understand images in addition to text inputs, with plans to further expand it to sound and spoken language.

Both Gemini 1.5 Pro and Flash are available in preview and will be generally available in June through Google AI Studio and Vertex AI.

About the Author

Sergio De Simone

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Sergio De Simone

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter