Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google Announces Multi-Modal Gemini 1.5 with Million Token Context Length

Google Announces Multi-Modal Gemini 1.5 with Million Token Context Length

This item in japanese

One week after announcing Gemini 1.0 Ultra, Google announced additional details about its next-generation model, Gemini 1.5. The new iteration comes with an expansion of its context window and the adoption of a "Mixture of Experts" (MoE) architecture, promising to make the AI both faster and more efficient. The new model also includes expanded multimodal capabilities.

The ability to process up to 1 million tokens dwarfs the capabilities of its competitors and even its predecessor. Google CEO Sundar Pichai highlighted the transformative potential of this feature, stating, "This allows use cases where you can add a lot of personal context and information at the moment of the query...I view it as one of the bigger breakthroughs we have done."

Gemini 1.5's utilization of the Mixture of Experts technique represents another stride towards optimizing AI efficiency. Selectively activating relevant parts of the model based on the query ensures both speed and resource conservation, a critical advancement as AI models become increasingly complex and power-hungry. This approach enhances the user experience by reducing wait times and aligns with broader efforts to make AI more sustainable.

According to Jeff Dean, chief scientist, Google DeepMind and Google Research:

“The multimodal capabilities of the model means you can interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more”

Those wanting to watch organized demonstrations of Gemini 1.5 can refer to videos of it problem solving across 100,000 lines of code or retrieving across a 44-minute movie.

With OpenAI recently unveiling memory capabilities for ChatGPT and signalling a push into web search, the race is on to build not just the most powerful AI. Google's focus with Gemini 1.5 on both developers and enterprise users, ahead of a broader consumer rollout, underscores the importance of AI as a tool for business innovation and personal productivity.

What truly matters is how well the model actually uses the context to solve real-world problems, and Gemini-1.5 has surpassed the SOTA with flying colors. - Jim Fan

Despite the excitement surrounding Gemini 1.5, it's clear that Google is still in the early stages of exploring its full potential. Gemini 1.5 will only be available to business users and developers through Vertex AI and AI Studio.  The model's impressive capabilities come with challenges, notably in processing speed for tasks involving its maximum context window. As Oriol Vinyals, VP of research at Google DeepMind, acknowledged, "The latency aspect [is something] we’re … working to optimize — this is still in an experimental stage, in a research stage." Yet, the promise of future optimizations and the exploration of even larger context windows suggest that Google only scratches the surface of what's possible.

Developers interested in learning more about Gemini 1.5 can look at the technical report for additional information about the model, including model card, training information, and additional details about model evaluation.

About the Author

Rate this Article