InfoQ Homepage AI Architecture Content on InfoQ
-
Six Sessions at QCon AI Boston 2026 That Take Productionizing AI Seriously
QCon AI Boston 2026 is close to selling out. Six sessions where speakers engage directly with the gap between AI working in a demo and AI working in production.
-
Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production
Cloudflare and Stripe launched a protocol that lets AI agents autonomously create cloud accounts, register domains, start subscriptions, and deploy to production. Stripe handles identity and payment with a $100/month default cap. No other major cloud provider offers comparable agent-driven account provisioning.
-
Cloudflare Introduces Workflows V2 with Deterministic Execution and 50K Concurrent Workflows
Cloudflare introduces Workflows V2, a redesigned distributed workflow orchestration system with deterministic replayable execution, improved observability, and major scaling upgrades, including 50,000 concurrent instances and 2M queued workflows. It supports AI agents, data pipelines, and background processing with improved reliability across distributed systems.
-
Anthropic Traces Six Weeks of Claude Code Quality Complaints to Three Overlapping Product Changes
Anthropic published a postmortem tracing six weeks of Claude Code quality complaints to three overlapping product-layer changes: a reasoning effort downgrade, a caching bug that progressively erased the model's own thinking, and a system prompt verbosity limit that caused a 3% quality drop. The API and model weights were unaffected. All issues were resolved April 20.
-
AWS WorkSpaces Now Lets AI Agents Operate Legacy Desktop Applications without APIs
AWS announced that Amazon WorkSpaces can now serve as managed virtual desktops for AI agents in public preview. Agents authenticate through IAM and operate legacy applications via computer vision and input simulation without APIs. Reflex benchmarks show vision agents consume 45x more tokens than API agents.
-
Netflix Introduces ‘Model Lifecycle Graph’ to Scale Enterprise Machine Learning
Netflix has developed a graph-based architecture for managing machine learning systems, called the Model Lifecycle Graph. This system maps interconnections between datasets, models, features, and workflows, addressing challenges in scaling ML operations. It enhances discoverability, governance, and component reuse while supporting a self-service approach for engineers and data scientists.
-
How GitHub Is Securing Agentic Workflows in Modern CI CD Systems
GitHub detailed a defense-in-depth security architecture for agentic workflows in CI/CD pipelines, focusing on isolation, constrained execution, and auditability. The design aims to safely integrate autonomous AI agents while mitigating risks like prompt injection, privilege escalation, and unintended actions, using sandboxed environments, restricted permissions, and full execution traceability.
-
OpenAI Introduces Websocket-Based Execution Mode to Reduce Latency in Agentic Workflows
OpenAI introduces a WebSocket-based execution mode for its Responses API to improve agentic workflow performance in coding agents and real-time AI systems. The update reduces latency by up to 40 percent by replacing HTTP request-response cycles with persistent connections, improving streaming, tool execution, and multi-step orchestration in production-scale AI systems.
-
Google Announces GKE Agent Sandbox and Hypercluster at Next '26
Google announced GKE Agent Sandbox and hypercluster at Cloud Next '26. Agent Sandbox uses gVisor kernel isolation for secure agent code execution at 300 sandboxes per second, built as an open-source Kubernetes SIG Apps subproject. It is currently the only native agent sandbox among the three major hyperscalers. Hypercluster manages a million chips from a single control plane.
-
Inside Claude Code Auto Mode: Anthropic’s Autonomous Coding System with Human Approval Gates
Anthropic has introduced auto mode in Claude Code, enabling multi-step software development workflows with reduced manual intervention. The feature combines automated execution with layered safety mechanisms, including input filtering, action evaluation, and two-stage classification, while maintaining human approval checkpoints for sensitive operations.
-
Cloudflare Builds High-Performance Infrastructure for Running LLMs
Cloudflare has recently announced new infrastructure designed to run large AI language models across its global network. As these models rely on costly hardware and must handle large volumes of incoming and outgoing text, Cloudflare separates the model's input processing and output generation onto different optimized systems.
-
Anthropic Introduces Managed Agents to Simplify AI Agent Deployment
Anthropic introduces Managed Agents on Claude, a managed execution layer for agent-based workflows. It separates agent logic from runtime concerns like orchestration, sandboxing, state management, and credentials. The system supports long-running multi-step workflows with external tools, error recovery, and session continuity via a meta-harness architecture.
-
GitHub Acknowledges Recent Outages, Cites Scaling Challenges and Architectural Weaknesses
GitHub has publicly addressed a series of recent availability and performance issues that disrupted services across its platform, attributing the incidents to rapid growth, architectural coupling, and limitations in handling system load.
-
Designing Memory for AI Agents: inside Linkedin’s Cognitive Memory Agent
LinkedIn introduces Cognitive Memory Agent (CMA), generative AI infrastructure layer enabling stateful, context-aware systems. It provides persistent memory across episodic, semantic, and procedural layers, supporting multi-agent coordination, retrieval, and lifecycle management. CMA addresses LLM statelessness and enables production-grade personalization and long-term context in AI applications.
-
Cloudflare Launches Code Mode MCP Server to Optimize Token Usage for AI Agents
Cloudflare has launched a new Model Context Protocol (MCP) server powered by Code Mode, enabling AI agents to interact with large APIs with minimal token usage. The server reduces context footprint across 2,500+ endpoints, improves multi-API orchestration, and provides a secure, code-centric execution environment for LLM agents.