Google Opens Gemma 4 Under Apache 2.0 with Multimodal and Agentic Capabilities

Google has recently released Gemma 4, a family of open-weight models spanning effective 2B and 4B edge variants, a 26B Mixture-of-Experts model, and a 31B dense model, all distributed under an Apache 2.0 license. The release introduces native video and image processing across the lineup, audio input on the smaller models, context windows up to 256K tokens, and benchmark results that place the 31B dense variant in a bracket typically occupied by models three to five times its size.

The agent-oriented framing is reflected in concrete capabilities. Google reports that the 31B variant scores 84.3% on GPQA Diamond and 80.0% on LiveCodeBench v6. The GPQA Diamond result nearly doubles the 42.4% achieved by the prior Gemma 3 IT 27B, reflecting substantial gains in science reasoning and code generation. For tool use, the models add native support for function-calling, structured JSON output, and native system instructions, a combination intended to enable developers to build autonomous agents that interact with external tools and APIs and execute multi-step workflows reliably.

Architecturally, the lineup spans both dense and sparse designs. The 26B MoE model activates only 3.8 billion parameters during inference to deliver fast tokens-per-second, while the 31B dense variant targets workloads where consistent per-token cost matters more than peak parameter count. The edge models, sized for mobile and IoT devices where memory and power budgets are tight, offer a 128K context window; the larger models extend to 256K tokens, large enough to ingest sizable code repositories or long-form documents in a single prompt. All four variants natively process video and images at variable resolutions, and the E2B and E4B edge models add native audio input for speech recognition and understanding; the family is trained on more than 140 languages.

On benchmarks, Google reports that the 31B dense model achieved an estimated LLMArena score (text-only) of 1452, placing it in a performance bracket usually reserved for significantly larger models with triple- to quintuple-the parameter count.

Source: Google blog

Reactions in the open-model community have focused less on raw scores and more on usability and new licensing. Sam Witteveen applauded the Apache 2.0 license.

This is an actual real Apache 2 license, which means for the first time, you can take Google's best open model, modify it, fine tune it, deploy it commercially, do whatever you want with it. No strings attached

Nathan Lambert argues that Gemma 4’s value lies in its frictionless integration, noting:

Gemma 4’s success is going to be entirely determined by ease of use, to a point where a 5-10% swing on benchmarks wouldn’t matter at all. It’s strong enough, small enough, with the right license, and from the U.S., so many companies are going to slot it in.

Day-zero distribution is notably broad: weights are available on Hugging Face, Kaggle, with reference paths through vLLM, llama.cpp, Ollama, MLX, LM Studio, Unsloth, SGLang, and NVIDIA NIM, plus an NVFP4 quantized 31B checkpoint using NVIDIA Model Optimizer. Kaggle is running the Gemma 4 Good Challenge, inviting developers to build products that create meaningful positive change using the new models.

About the Author

Hien Luu

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Hien Luu

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter