InfoQ Homepage Large language models Content on InfoQ
-
GitHub MCP Registry Offers a Central Hub for Discovering and Deploying MCP Servers
GitHub has recently launched its Model Context Protocol (MCP) Registry, designed to help developers discover and use the AI tools directly from within their working environment. The registry currently lists over 40 MCP servers from Microsoft, GitHub, Dynatrace, Terraform, and many others.
-
OpenAI Study Investigates the Causes of LLM Hallucinations and Potential Solutions
In a recent research paper, OpenAI suggested that the tendency of LLMs to hallucinate stems from the way standard training and evaluation methods reward guessing over acknowledging uncertainty. According to the study, this insight could pave the way for new techniques to reduce hallucinations and build more trustworthy AI systems, but not all agree on what hallucinations are in the first place.
-
Claude Sonnet 4.5 Tops SWE-Bench Verified, Extends Coding Focus beyond 30 Hours
Anthropic's Claude Sonnet 4.5, its most advanced coding model, excels in task performance and safety, achieving a 98.7% safety score and improving real-world coding capabilities. Enhanced reasoning skills allow for sustained multi-step tasks, with notable user gains reported. This drop-in replacement demonstrates a powerful balance of capability and security for users.
-
Google DeepMind Introduces CodeMender, an AI Agent for Automated Code Repair
Google DeepMind has introduced CodeMender, a new AI-driven agent designed to detect, fix, and secure software vulnerabilities automatically. The project builds on recent advances in reasoning models and program analysis, aiming to reduce the time developers spend identifying and patching security issues.
-
GitHub Introduces New Embedding Model to Improve Code Search and Context
GitHub has introduced a new embedding model for Copilot, now integrated into Visual Studio Code. The model is designed to improve how Copilot understands programming context, retrieves relevant code, and suggests completions.
-
The New Data Commons MCP Server Unlocks a Wealth of Public Datasets for AI Developers
Google has recently introduced the Data Commons Model Context Protocol (MCP) Server, a tool that enables AI developers and researchers to easily access the public dataset collection available through Data Commons.
-
IBM Releases Granite-Docling-258M, a Compact Vision-Language Model for Precise Document Conversion
IBM Research has recently introduced Granite-Docling-258M, a new open-source vision-language model (VLM) designed for high-fidelity document-to-text conversion while preserving complex layouts, tables, equations, and lists.
-
Thinking Machines Releases Tinker API for Flexible Model Fine-Tuning
Thinking Machines has released Tinker, an API for fine-tuning open-weight language models. The service is designed to reduce infrastructure overhead for developers, providing managed scheduling, GPU allocation, and checkpoint handling. By abstracting away cluster management, Tinker allows fine-tuning through simple Python calls.
-
Claude Sonnet 4.5 Ranked Safest LLM from Open-Source Audit Tool Petri
Claude Sonnet 4.5 has emerged as the best-performing model in ‘risky tasks’, narrowly edging out GPT-5 in early evaluations by Petri --- Anthropic’s new open-source AI auditing tool.
-
Agoda Leverages ChatGPT in the CI/CD Process for SQL Stored Procedure Optimization
Agoda started utilizing ChatGPT to optimize SQL stored procedures (SP) as part of their CI/CD process. After introducing the automated LLM-assisted step, the company observed shortened stored procedure optimization times, which lightened the load on DB developers. Agora works on making ChatGPT more accessible for SP optimization outside of the CI/CD pipeline.
-
Anthropic Reveals Three Infrastructure Bugs behind Claude Performance Issues
Anthropic recently published a postmortem revealing that three distinct infrastructure bugs intermittently degraded the output quality of its Claude models in recent weeks. While the company states it has now resolved those issues and is modifying its internal processes to prevent similar disruptions, the community highlights the challenges of running the service across three hardware platforms.
-
Perplexity Launches Search API to Power Next-Gen AI Applications
Perplexity has introduced the Search API, opening up access to the same infrastructure that underpins its public answer engine. With coverage of hundreds of billions of webpages and infrastructure tuned for AI-heavy workloads, the new API is aimed at developers who want real-time, reliable search results for building their own agents, applications, and retrieval-augmented pipelines.
-
DeepMind Releases Gemini Robotics-ER 1.5 for Embodied Reasoning
Google DeepMind introduced Gemini Robotics-ER 1.5, a new embodied reasoning model for robotic applications. The model is available in preview through Google AI Studio and the Gemini API.
-
Google Stax Aims to Make AI Model Evaluation Accessible for Developers
Google Stax is a framework designed to replace subjective evaluations of AI models with an objective, data-driven, and repeatable process for measuring model output quality. Google says this will allow AI developers to tailor the evaluation process to their specific use cases rather than relying on generic benchmarks.
-
OWASP Flags Tool Misuse as Critical Threat for Agentic AI
Earlier this year OWASP released guidance for Agentic AI security called Agentic AI - Threats and Mitigations. The document highlights the unique challenges involved in securely deploying this emerging technology and suggests mitigations and architectural patterns for defense.