InfoQ Homepage Google Content on InfoQ
-
Google Cloud Run Now Offers Serverless GPUs for AI and Batch Processing
Google Cloud has launched NVIDIA GPU support for Cloud Run, enhancing its serverless platform with scalable, cost-efficient GPU resources. This upgrade enables rapid AI inference and batch processing, featuring pay-per-second billing and automatic scaling to zero. Developers can access seamless GPU support easily, making advanced AI applications faster and more accessible.
-
Google Brings Gemini Nano to ML Kit with New On-Device GenAI APIs
The new GenAI APIs recently added to ML Kit enable developers to use Gemini Nano for on-device inference in Android apps, supporting features like summarization, proofreading, rewriting, and image description.
-
Google Releases LMEval, an Open-Source Cross-Provider LLM Evaluation Tool
LMEval aims to help AI researchers and developers compare the performance of different large language models. Designed to be accurate, multimodal, and easy to use, LMEval has already been used to evaluate major models in terms of safety and security.
-
Google Releases MedGemma: Open AI Models for Medical Text and Image Analysis
Google has released MedGemma, a pair of open-source generative AI models designed to support medical text and image understanding in healthcare applications. Based on the Gemma 3 architecture, the models are available in two configurations: MedGemma 4B, a multimodal model capable of processing both images and text, and MedGemma 27B, a larger model focused solely on medical text.
-
Gemma 3n Available for On-Device Inference Alongside RAG and Function Calling Libraries
Google has announced that Gemma 3n is now available in preview on the new LiteRT Hugging Face community, alongside many previously released models. Gemma 3n is a multimodal small language model that supports text, image, video, and audio inputs. It also supports finetuning, customization through retrieval-augmented generation (RAG), and function calling using new AI Edge SDKs.
-
Google Enhances LiteRT for Faster On-Device Inference
The new release of LiteRT, formerly known as TensorFlow Lite, introduces a new API to simplify on-device ML inference, enhanced GPU acceleration, support for Qualcomm NPU (Neural Processing Unit) accelerators, and advanced inference features.
-
Gemma 3 Supports Vision-Language Understanding, Long Context Handling, and Improved Multilinguality
Google’s generative artificial intelligence (AI) model Gemma 3 supports vision-language understanding, long context handling, and improved multi-linguality. In a recent blog post, Google DeepMind and AI Studio teams discussed the new features in Gemma 3. The model also highlights KV-cache memory reduction, a new tokenizer and offers better performance and higher resolution vision encoders.
-
Anthropic Introduces Web Search Functionality for Claude Models
Anthropic has announced the addition of web search capabilities to its Claude models, available via the Anthropic API. This update enables Claude to access current information from the web, allowing developers to create applications and AI agents that provide up-to-date insights.
-
Google Introduces DolphinGemma to Support Dolphin Communication Research
Google has released a new AI model called DolphinGemma, which has been developed to assist researchers in analyzing and interpreting dolphin vocalizations. The project is part of an ongoing collaboration with the Wild Dolphin Project (WDP) and researchers at Georgia Tech, and it focuses on identifying patterns in the natural communication of Atlantic spotted dolphins.
-
Google Unveils Ironwood TPU for AI Inference
Google's Ironwood TPU, its most advanced custom AI accelerator, powers the "age of inference" with unmatched performance and scalability. With up to 9,216 liquid-cooled chips, it outpaces competitors, delivering 42.5 Exaflops. Engineered for high-efficiency, low-latency AI tasks, Ironwood redefines potential in AI hardware, leveraging AlphaChip to revolutionize chip design.
-
Google Cloud WAN Aims to Transform Enterprise Networking
Google has launched Cloud WAN, a robust managed WAN solution built on its global network, featuring 202 PoPs and 2M miles of fiber. It promises secure, high-performance connectivity at lower costs, addressing the complexities of modern enterprise needs. With faster speeds and significant TCO savings, Cloud WAN integrates seamlessly with existing providers.
-
Google's Gemma 3 QAT Language Models Can Run Locally on Consumer-Grade GPUs
Google released the Gemma 3 QAT family, quantized versions of their open-weight Gemma 3 language models. The models use Quantization-Aware Training (QAT) to maintain high accuracy when the weights are quantized from 16 to 4 bits.
-
Google DeepMind Shares Approach to AGI Safety and Security
Google DeepMind has released a new paper outlining its approach to safety and security in the development of artificial general intelligence (AGI). AGI refers to AI systems that are as capable as humans at most cognitive tasks.
-
Google Releases Last Android 16 Beta before Official Launch
With the release of the last Android 16 beta, developers should ensure their apps or libraries are free of any compatibility issues. Google warns of changes—including JobScheduler quotas, stronger intent security, 16KB page size–that might affect apps even if they do not specifically target Android 16.
-
Google’s Cybersecurity Model Sec-Gemini Enables SecOps Workflows for Root Cause and Threat Analysis
Google’s new cybersecurity model Sec-Gemini focuses on cybersecurity AI to enable SecOps workflows for root cause analysis (RCA) and threat analysis, and vulnerability impact understanding.