InfoQ Homepage Large language models Content on InfoQ
-
Google DeepMind Introduces QuestBench to Evaluate LLMs in Solving Logic and Math Problems
Google DeepMind’s QuestBench benchmark helps in evaluating if LLMs can pinpoint the single, crucial question needed to solve logic, planning, or math problems. DeepMind team recently published an article on QuestBench which is a set of underspecified reasoning tasks solvable by asking at most one question.
-
Docker Model Runner Aims to Make it Easier to Run LLM Models Locally
Currently in preview with Docker Desktop 4.40 for macOS on Apple Silicon, Docker Model Runner allows developers to run models locally and iterate on application code using the local models- without disrupting their container-based workflows.
-
AI Continent: European Commission Outlines Strategy for Scaling AI Development
The European Commission has presented the AI Continent Action Plan, a new strategy designed to strengthen the European Union’s capacity for AI development and deployment. The plan outlines coordinated investment in infrastructure, access to high-quality data, AI adoption in strategic sectors, and support for regulatory implementation.
-
FastAPI-MCP: Simplifying the Integration of FastAPI with AI Agents
A new open-source library, FastAPI-MCP, is making it easier for developers to connect traditional FastAPI applications with modern AI agents through the Model Context Protocol (MCP). Designed for zero-configuration setup, FastAPI-MCP allows developers to automatically expose their API endpoints as MCP-compatible tools.
-
Google Releases Open-Source Agent Development Kit for Multi-Agent AI Applications
At Google Cloud Next 2025, Google announced the Agent Development Kit (ADK), an open-source framework aimed at simplifying the development of intelligent, multi-agent applications. The toolkit is designed to support developers across the entire lifecycle of agentic systems — from logic design and orchestration to debugging, evaluation, and deployment.
-
Datadog Employs LLMs for Assisting with Writing Accident Postmortems
Datadog combined structured metadata from its incident management app with Slack messages to create an LLM-driven functionality assisting engineers in composing incident postmortems. While working on this solution, the company dealt with the challenges of using LLMs outside of the interactive dialog systems and ensuring that high-quality content was produced.
-
Anthropic's "AI Microscope" Explores the Inner Workings of Large Language Models
Two recent papers from Anthropic attempt to shed light on the processes that take place within a large language model, exploring how to locate interpretable concepts and link them to the computational "circuits" that translate them into language, and how to characterize crucial behaviors of Claude Haiku 3.5, including hallucinations, planning, and other key traits.
-
Claude for Education: Anthropic’s AI Assistant Goes to University
Anthropic has announced the launch of Claude for Education, a specialized version of its AI assistant, Claude, developed specifically for colleges and universities. The initiative aims to support students, faculty, and administrators with secure and responsible AI integration across academics and campus operations.
-
Microsoft Collaborates with Anthropic to Launch C# SDK for MCP Integration
Microsoft has partnered with Anthropic to develop an official C# SDK for the Model Context Protocol (MCP), an open protocol designed to connect large language models (LLMs) with external tools and data sources. The SDK is open-source and available under the modelcontextprotocol GitHub organization.
-
AMD’s Gaia Framework Brings Local LLM Inference to Consumer Hardware
AMD has released Gaia, an open-source project allowing developers to run large language models (LLMs) locally on Windows machines with AMD hardware acceleration. The framework supports retrieval-augmented generation (RAG) and includes tools for indexing local data sources. Gaia is designed to offer an alternative to LLMs hosted on a cloud service provider (CSP).
-
Meta AI Releases Llama 4: Early Impressions and Community Feedback
Meta has officially released the first models in its new Llama 4 family—Scout and Maverick—marking a step forward in its open-weight large language model ecosystem. Designed with a native multimodal architecture and a mixture-of-experts (MoE) framework, these models aim to support a broader range of applications, from image understanding to long-context reasoning.
-
Announcing QCon AI: Focusing on Practical, Scalable AI Implementation for Engineering Teams
QCon AI focuses on practical, real-world AI for senior developers, architects, and engineering leaders. Join us Dec 16-17, 2025, in NYC to learn how teams are building and scaling AI in production—covering MLOps, system reliability, cost optimization, and more. No hype, just actionable insights from those doing the work.
-
How SREs and GenAI Work Together to Decrease eBay's Downtime: an Architect's Insights at KubeCon EU
During his KubeCon EU keynote, Vijay Samuel, Principal MTS Architect at eBay, shared his team’s experience of enhancing incident response capabilities by incorporating ML and LLM building blocks. They realised that GenAIs are not a silver bullet but can help engineers through complex incident investigations through logs, traces, and dashboard explanations.
-
How Observability Can Improve the UX of LLM Based Systems: Insights of Honeycomb's CEO at KubeCon EU
During her KubeCon Europe keynote, Christine Yen, CEO and co-founder of Honeycomb, provided insights on how observability can help cope with the rapid shifts introduced by the integration of LLMs in software systems, which transformed not only the way we develop software but also the release methodology. She explained how to adapt your development feedback loop based on production observations.
-
OpenAI Introduces New Speech Models for Transcription and Voice Generation
OpenAI has introduced new speech-to-text and text-to-speech models in its API, focusing on improving transcription accuracy and offering more control over AI-generated voices. These updates aim to enhance automated speech applications, making them more adaptable to different environments and use cases.