InfoQ Homepage Large language models Content on InfoQ
-
OpenAI Launches Frontier, a Platform to Build, Deploy, and Manage AI Agents Across the Enterprise
OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents, designed to make AI agents reliable, scalable, and integrated into real company systems and workflows.
-
Hugging Face Introduces Community Evals for Transparent Model Benchmarking
Hugging Face has launched Community Evals, a feature that enables benchmark datasets on the Hub to host their own leaderboards and automatically collect evaluation results from model repositories.
-
GitHub Agentic Workflows Unleash AI-Driven Repository Automation
Recently launched in technical preview, GitHub Agentic Workflows introduce a way to automate complex, repetitive repository tasks using coding agents that understand context and intent, GitHub says. This enables workflows such as automatic issue triage and labeling, documentation updates, CI troubleshooting, test improvements, and reporting.
-
How Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search
Dropbox engineers have detailed how the company built the context engine behind Dropbox Dash, revealing a shift toward index-based retrieval, knowledge graph-derived context, and continuous evaluation to support enterprise AI at scale.
-
Google Explores Scaling Principles for Multi-Agent Coordination
Google Research tried to answer the question of how to design agent systems for optimal performance by running a controlled evaluation of 180 agent configurations. From this, the team derived what they call the "first quantitative scaling principles for AI agent systems", showing that multi-agent coordination does not reliably improve results and can even reduce performance.
-
GitHub Copilot SDK Lets Developers Integrate Copilot CLI's Engine into Apps
Now available in technical preview on GitHub, the GitHub Copilot SDK lets developers embed the same engine that powers GitHub Copilot CLI into their own apps, making it easier to build agentic workflows.
-
Windsurf Introduces Arena Mode to Compare AI Models During Development
Windsurf has introduced Arena Mode inside its IDE allowing developers to compare large language models side by side while working on real coding tasks. The feature is designed to let users evaluate models directly within their existing development context, rather than relying on public benchmarks or external evaluation websites.
-
Next Moca Releases Agent Definition Language as an Open Source Specification
Moca has open-sourced Agent Definition Language (ADL), a vendor-neutral specification intended to standardize how AI agents are defined, reviewed, and governed across frameworks and platforms. The project is released under the Apache 2.0 license and is positioned as a missing “definition layer” for AI agents, comparable to the role OpenAPI plays for APIs.
-
Datadog Integrates Google Agent Development Kit into LLM Observability Tools
Datadog recently announced that its LLM Observability platform now provides automatic instrumentation for applications built with Google's Agent Development Kit (ADK), offering deeper visibility into the behavior, performance, cost, and safety of AI-driven agentic systems.
-
Vercel Introduces Skills.sh, an Open Ecosystem for Agent Commands
Vercel has released Skills.sh, an open-source tool designed to provide AI agents with a standardized way to execute reusable actions, or skills, through the command line.
-
Agent Trace: Cursor Proposes an Open Specification for AI Code Attribution
Cursor has published Agent Trace, a draft open specification aimed at standardizing how AI-generated code is attributed in software projects. Released as a Request for Comments (RFC), the proposal defines a vendor-neutral format for recording AI contributions alongside human authorship in version-controlled codebases.
-
MongoDB Introduces Embedding and Reranking API on Atlas
MongoDB has recently announced the public preview of its Embedding and Reranking API on MongoDB Atlas. The new API gives developers direct access to Voyage AI’s search models within the managed cloud database, enabling them to create features such as semantic search and AI-powered assistants within a single integrated environment, with consolidated monitoring and billing.
-
Open Responses Specification Enables Unified Agentic LLM Workflows
OpenAI's Open Responses standardizes agentic AI workflows, tackling API fragmentation and enabling seamless transitions between proprietary and open-source models. Supported by partners like Hugging Face and Vercel, this specification enhances reasoning visibility and tool execution, streamlining complex workflows and boosting productivity for developers. Empower your AI integration today!
-
Cloudflare's Matrix Homeserver Demo Sparks Debate over AI-Generated Code Claims
A Cloudflare blog post claiming a "production-grade" Matrix homeserver on Workers didn't survive community scrutiny. Missing federation, incomplete encryption, and TODO comments in authentication logic pointed to unreviewed AI output. Matrix's Matthew Hodgson welcomed the effort but noted the implementation "doesn't yet constitute a functional Matrix server."
-
OpenAI Launches Prism, a Free LaTeX-Native Workspace with Integrated GPT-5.2
OpenAI has released Prism, a free, cloud-based LaTeX workspace designed for academic writing and collaboration, with GPT-5.2 integrated directly into the authoring environment. The platform combines document editing, compilation, citation management, and AI-assisted revision in a single web-based workspace, aimed at researchers producing long-form scientific documents.