Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google Launches New Multi-Modal Gemini AI Model

Google Launches New Multi-Modal Gemini AI Model

This item in japanese

On December 6, Alphabet released the first phase of its next-generation AI model, Gemini. Gemini was overseen and driven by its CEO, Sundar Pichai and Google DeepMind.

Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the performance of language models. Gemini can generate code based on different inputs, generate text and images combined, and reason visually across languages.

According to Sundar Pichai, CEO of Google, Gemini outperforms OpenAI's ChatGPT. He highlighted Gemini's proficiency on a set of tests measuring AI performance on a variety of tasks involving text and images.

It’s also exciting because Gemini Ultra is state of the art in 30 of the 32 leading benchmarks, and particularly in the multimodal benchmarks. That MMMU benchmark—it shows the progress there. I personally find it exciting that in MMLU [massive multi-task language understanding], which has been one of the leading benchmarks, it crossed the 90% threshold, which is a big milestone. The state of the art two years ago was 30, or 40%. So just think about how much the field is progressing. Approximately 89% is a human expert across these 57 subjects. It’s the first model to cross that threshold. - Sundar Pichai

Beyond its multimodal capabilities, Gemini is designed for efficiency and scalability. Its architecture allows for rapid integration with existing tools and APIs, making it a powerful engine for driving future innovations in AI. This open-source approach fosters collaboration and development across the AI community, accelerating the pace of progress and ensuring that Gemini's potential is fully realized.

There are three initial versions of Gemini: Ultra, the largest; Pro, of medium size; and Nano, which is significantly smaller and more efficient. Google’s Bard, a chatbot similar to ChatGPT, will be powered by Gemini Pro. The Nano will run on Google's Pixel 8 Pro phone.

Reaction on social media has been mixed, with some reporting impressive resulting and others noting ongoing hallucinations. Melanie Mitchell, an artificial-intelligence researcher at the Santa Fe Institute in New Mexico, said “It’s clear that Gemini is a very sophisticated AI system, but it’s not obvious to me that Gemini is actually substantially more capable than GPT-4."

I'm extremely disappointed with Gemini Pro on Bard. It still gives very, very bad results to questions that shouldn't be hard anymore with RAG. A simple question like this with a simple answer like this, and it still got it WRONG. - Vitor de Lucca

Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. The model is named in relation to NASA's Project Gemini. The model consists of decoder-only Transformers, with modifications to allow efficient training and inference on TPUs. Input images may be of different resolutions, while video is inputted as a sequence of images. Audio is sampled at 16 kHz and then converted into a sequence of tokens by the Universal Speech Model.

Before releasing Gemini, its team developed model impact assessments to identify, assess, and document the key societal benefits and potential harms associated with the development of the advanced Gemini models. Based on the understanding of known and anticipated effects, a set of "model policies" were developed to guide the development and evaluation of the models. To evaluate the Gemini models against policy areas and other key risk areas identified within the impact assessments, a comprehensive suite of evaluations were ran.

Mitigations were also implemented at the data layer of the model, and instruction tuning was also utilized to mitigate model safety issues. To reduce hallucinations, methods of attribution, closed-book response generation, and hedging were utilized. In accordance with the Executive Order 14110 signed by President Joe Biden in October, Google stated that it would share testing results of Gemini Ultra with the federal government of the United States.

Developers wishing to learn more about Gemini may read a technical report made available by Google.

About the Author

Rate this Article