InfoQ Homepage Large language models Content on InfoQ
-
Claude Sonnet 4.5 Ranked Safest LLM from Open-Source Audit Tool Petri
Claude Sonnet 4.5 has emerged as the best-performing model in ‘risky tasks’, narrowly edging out GPT-5 in early evaluations by Petri --- Anthropic’s new open-source AI auditing tool.
-
Agoda Leverages ChatGPT in the CI/CD Process for SQL Stored Procedure Optimization
Agoda started utilizing ChatGPT to optimize SQL stored procedures (SP) as part of their CI/CD process. After introducing the automated LLM-assisted step, the company observed shortened stored procedure optimization times, which lightened the load on DB developers. Agora works on making ChatGPT more accessible for SP optimization outside of the CI/CD pipeline.
-
Anthropic Reveals Three Infrastructure Bugs behind Claude Performance Issues
Anthropic recently published a postmortem revealing that three distinct infrastructure bugs intermittently degraded the output quality of its Claude models in recent weeks. While the company states it has now resolved those issues and is modifying its internal processes to prevent similar disruptions, the community highlights the challenges of running the service across three hardware platforms.
-
Perplexity Launches Search API to Power Next-Gen AI Applications
Perplexity has introduced the Search API, opening up access to the same infrastructure that underpins its public answer engine. With coverage of hundreds of billions of webpages and infrastructure tuned for AI-heavy workloads, the new API is aimed at developers who want real-time, reliable search results for building their own agents, applications, and retrieval-augmented pipelines.
-
DeepMind Releases Gemini Robotics-ER 1.5 for Embodied Reasoning
Google DeepMind introduced Gemini Robotics-ER 1.5, a new embodied reasoning model for robotic applications. The model is available in preview through Google AI Studio and the Gemini API.
-
Google Stax Aims to Make AI Model Evaluation Accessible for Developers
Google Stax is a framework designed to replace subjective evaluations of AI models with an objective, data-driven, and repeatable process for measuring model output quality. Google says this will allow AI developers to tailor the evaluation process to their specific use cases rather than relying on generic benchmarks.
-
OWASP Flags Tool Misuse as Critical Threat for Agentic AI
Earlier this year OWASP released guidance for Agentic AI security called Agentic AI - Threats and Mitigations. The document highlights the unique challenges involved in securely deploying this emerging technology and suggests mitigations and architectural patterns for defense.
-
Hugging Face Introduces mmBERT, a Multilingual Encoder for 1,800+ Languages
Hugging Face has released mmBERT, a new multilingual encoder trained on more than 3 trillion tokens across 1,833 languages. The model builds on the ModernBERT architecture and is the first to significantly improve upon XLM-R, a long-time baseline for multilingual understanding tasks.
-
Google's Agent Development Kit for Java Adds Integration with LangChain4j
The latest release of the Agent Development Kit for Java, version 0.2.0, marks a significant expansion of its capabilities through the integration with the LangChain4j LLM framework, which opens it up to all the large language models supported by the framework.
-
Report Finds LLMs Not Yet Ready to Replace SREs in Incident Management
A study by ClickHouse found that large language models (LLMs) can't yet replace Site Reliability Engineers (SREs) for tasks such as finding the root causes of incidents. The study tested five leading models against real-world observability data to determine whether AI could autonomously identify production issues.
-
xAI Releases Grok 4 Fast with Lower Cost Reasoning Model
xAI has introduced Grok 4 Fast, a new reasoning model designed for efficiency and lower cost.
-
Google Introduces VaultGemma: An Experimental Differentially Private LLM
VaultGemma is a 1B-parameter Gemma 2-based LLM that Google trained from scratch using differential privacy (DP) with the aim of preventing the model from memorizing and later regurgitating training data. While still a research model, VaultGemma could enable applications cases in healthcare, finance, legal, and other regulated sectors.
-
Baidu’s PP-OCRv5 Released on Hugging Face, Outperforming VLMs in OCR Benchmarks
Baidu has released PP-OCRv5 on Hugging Face, a new optical character recognition (OCR) model built to outperform large vision-language models (VLMs) in specialized text recognition tasks. Unlike general-purpose architectures such as Gemini 2.5 Pro, Qwen2.5-VL, or GPT-4o, which handle OCR as part of broader multimodal workflows, PP-OCRv5 is purpose-built for accuracy, efficiency, and speed.
-
Hugging Face Brings Open-Source LLMs to GitHub Copilot Chat in VS Code
Hugging Face has introduced a new integration that allows developers to connect Inference Providers directly with GitHub Copilot Chat in Visual Studio Code. The update means that open-source large language models — including Kimi K2, DeepSeek V3.1, GLM 4.5, and others — can now be accessed and tested from inside the VS Code editor, without the need to switch platforms or juggle multiple tools.
-
Kaggle Introduces Game Arena to Benchmark AI Models in Strategic Games
Kaggle, in collaboration with Google DeepMind, has introduced Kaggle Game Arena, a platform designed to evaluate artificial intelligence models by testing their performance in strategy-based games.