Azure API Management Ships Unified Model API and MCP Content Safety at Build 2026

Microsoft announced a major expansion of the AI gateway capabilities in Azure API Management at Build 2026. The headline additions: a Unified Model API that lets clients speak one API format while APIM transforms requests to different backend providers, AI gateway support extended to Anthropic and Google Vertex AI models, and content safety policies that now cover MCP tool calls and Agent-to-Agent (A2A) communication alongside LLM traffic.

The APIM team writes:

Rather than introducing separate governance platforms for agents, Azure API Management enables organizations to extend familiar API governance principles to emerging agent ecosystems.

The Unified Model API, now in public preview, addresses a growing operational pain point as enterprise teams increasingly mix models from OpenAI, Anthropic, Google, and other providers based on performance, cost, latency, or regional requirements. Moreover, each provider exposes a different API format. Yet the Unified Model API lets clients standardize on a single format, currently OpenAI Chat Completions, while APIM transparently transforms requests to the backend provider's native format, whether that is the Anthropic Messages API or another schema. Finally, teams can swap backend providers, add new models, or route traffic across providers without changing client code.

This is not just a convenience layer. Centralizing model access behind a single API surface means that every governance policy, rate limit, content safety check, and token metric applies consistently, regardless of which provider handles inference. Organizations already using APIM for traditional API governance can extend the same patterns to their AI workloads without introducing a parallel governance stack.

The content safety extension to MCP and A2A is the most architecturally significant change, where the existing llm-content-safety policy, which scans LLM request and response content against Azure Content Safety, now also covers MCP tool-call arguments, MCP response text, and A2A agent payloads. Furthermore, the policy provides two distinct safety layers: category-based filtering (Hate, SelfHarm, Sexual, Violence) with configurable severity thresholds from 0 (most restrictive) to 7 (least restrictive), and a separate shield-prompt attribute that specifically checks for adversarial prompt-injection attacks. A typical configuration looks like:

<llm-content-safety backend-id="content-safety-backend" shield-prompt="true" enforce-on-completions="true">
    <categories output-type="EightSeverityLevels">
        <category name="Hate" threshold="4" />
        <category name="Violence" threshold="4" />
    </categories>
</llm-content-safety>

One implementation detail teams should be aware of is that the policy behaves differently for streaming responses. In non-streaming mode, a violation returns a clean 403 block. In streaming mode, the policy buffers events in a sliding window and simply stops forwarding further events to the client without returning an error. Agents consuming streaming completions need to handle an abrupt stop gracefully rather than expecting an explicit error code. Two new attributes, window-size and window-overlap-size, let teams tune how content exceeding the Azure Content Safety limit of 10,000 characters is split for evaluation.

Token metrics have been expanded to match the multi-provider reality. APIM now logs reasoning tokens, cached tokens, and audio tokens to Application Insights for the OpenAI Chat Completions, OpenAI Responses, and Anthropic Messages API formats. Providers tracked include Microsoft Foundry, OpenAI, Amazon Bedrock, Google Vertex AI, and others. For FinOps teams building cost dashboards and budget alerts, the expanded metrics reflect how current models actually behave, with reasoning and caching consuming significant token budgets that earlier metrics didn't capture.

On the discovery side, the Azure API Center data plane MCP server reached general availability. It acts as a unified enterprise discovery endpoint: agents and developer tools can access registered MCP servers, tools, APIs, agents, and AI assets through a single MCP connection. When a team registers a new MCP server in API Center, it becomes automatically discoverable to all connected agents without requiring individual client reconfigurations.

APIM can also now expose existing REST APIs as MCP servers, meaning enterprise APIs that predate the agent era become agent-callable without rebuilding them. Combined with the Logic Apps MCP Server that reached GA at the same Build, Microsoft is building two parallel paths for making enterprise capabilities available to agents: one through the API gateway layer (APIM) and one through the integration platform layer (Logic Apps).

The competitive context matters for teams evaluating AI gateway options. AWS offers Bedrock Guardrails for content filtering and model access controls, but has no equivalent to APIM's multi-provider Unified Model API or its MCP/A2A content safety coverage. Google's Apigee has added some AI gateway features, but not at the protocol breadth APIM now covers. Cloudflare's AI Gateway focuses on spend limits and caching rather than multi-protocol governance. APIM's bet is that the API gateway, not a new product category, is the natural control plane for AI workloads.

The AI gateway capabilities are available across APIM tiers. The Unified Model API is in public preview. Content safety for MCP and A2A, extended token metrics, and API Center MCP server are generally available. The AI Gateway labs provide 30+ hands-on Jupyter notebooks with step-by-step instructions and deployable Bicep templates.

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter