InfoQ Homepage Architecture Content on InfoQ
-
Netflix Scales "Human Infrastructure" to Manage Global Live Operations
Netflix has introduced a "human infrastructure" layer to manage live broadcasts at scale. Using a low-latency "telemetry hot path" and a Live Operations Centre, the company now balances automated scaling with human oversight. This shift, which mirrors strategies at AWS and Disney+, focuses on maintaining reliability through expert intervention during high-concurrency global events.
-
QCon San Francisco 2026: 12 Tracks Announced
The 12 tracks for QCon San Francisco 2026 (November 16-20) are now live. Four tracks cover AI in production. The other eight cover the rest of what senior engineering still demands: distributed systems, architecture teardowns, resilience, platform internals, API design, and Staff+ leadership. Early bird pricing runs until May 12th.
-
Cloudflare Sandboxes Reach General Availability, Giving AI Agents Persistent Isolated Environments
Cloudflare has released Sandboxes and Containers into general availability, providing persistent isolated Linux environments for AI agent workloads. New capabilities include secure credential injection via egress proxy, PTY terminal support, persistent code interpreters, filesystem watching, and snapshot-based session recovery. Active CPU pricing charges only for used cycles.
-
Pinterest Reduces Spark OOM Failures by 96% through Auto Memory Retries
Pinterest Engineering cut Apache Spark out-of-memory failures by 96% using improved observability, configuration tuning, and automatic memory retries. Staged rollout, dashboards, and proactive memory adjustments stabilized data pipelines, reduced manual intervention, and lowered operational overhead across tens of thousands of daily jobs.
-
War in Iran Damages Multiple AWS Data Centers, Challenging Multi-AZ Assumptions
Earlier this month, Iranian drone strikes damaged three AWS data centers in the UAE and Bahrain, causing outages and disruptions to multiple services. The events, which affected multiple facilities within the same AWS region, sparked discussion in the community about how geopolitical conflict can directly impact global cloud infrastructure and multi-AZ deployments.
-
DoorDash Builds LLM Conversation Simulator to Test Customer Support Chatbots at Scale
DoorDash engineers built a simulation and evaluation flywheel to test large language model customer support chatbots at scale. The system generates multi-turn synthetic conversations using historical transcripts and backend mocks, evaluates outcomes with an LLM-as-judge framework, and enables rapid iteration on prompts, context, and system design before production deployment.
-
Advance Your Socio-Technical Architecture Skills with InfoQ’s New Online Cohorts
Enhance your architectural leadership with InfoQ’s new online cohorts starting April 15, May 7, and June 10, 2026. Led by Luca Mezzalira, this 5-week program focuses on socio-technical skills like ADRs, platform engineering, and AI trade-offs. Senior practitioners can apply frameworks to live projects, earn ICSAET certification, and contribute to the InfoQ community.
-
Architecting for Global Scale: inside DoorDash’s Unified, Composable Dasher Onboarding Platform
DoorDash has rebuilt its Dasher onboarding into a unified, modular platform to support global expansion. The new architecture uses reusable step modules, a centralized status map, and workflow orchestration to ensure consistent, localized onboarding experiences. This design reduces complexity, supports market-specific variations, and enables faster rollout to new countries.
-
OpenAI Secures AWS Distribution for Frontier Platform in $110B Multi-Cloud Deal
OpenAI's $110B funding includes AWS as the exclusive third-party distributor for the Frontier agent platform, introducing an architectural split: Azure retains stateless API exclusivity; AWS gains stateful runtime environments via Bedrock. Deal expands the existing $38B AWS agreement by $100B and commits 2GW of Trainium capacity.
-
Decentralizing Architectural Decisions with the Architecture Advice Process
Our system architectures have changed as technology and development practices have evolved, but the way we practice architecture hasn’t kept up. According to Andrew Harmel-Law, architecture needs to be decentralized, similar to how we have decentralized our systems. The alternative to having an architect take and communicate decisions is to “let anyone make the decisions” using the advice process.
-
How Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search
Dropbox engineers have detailed how the company built the context engine behind Dropbox Dash, revealing a shift toward index-based retrieval, knowledge graph-derived context, and continuous evaluation to support enterprise AI at scale.
-
QCon Previews 20th Anniversary Conferences: Production AI, Resilience, and Staff+ Engineering
Celebrating its 20th anniversary, QCon’s 2026 conferences in London and San Francisco will focus on the engineering realities of agentic AI, resilient architectures, and platform ROI. The programs continue the series' two-decade tradition of practitioner-led content, curated by senior engineers from companies like Zoox, UBS, and LinkedIn.
-
Cloudflare Launches Vertical Microfrontend Template for Path-Based Edge Routing
Cloudflare has launched a Worker template for Vertical Microfrontends (VMFE), enabling independent teams to manage their stacks for specific URL paths, improving CI/CD efficiency. This architecture streamlines requests with low latency while offering a seamless SPA experience, promoting team autonomy and efficient dev practices. Ideal for large teams, it comes with operational trade-offs.
-
GitHub Reworks Layered Defenses after Legacy Protections Block Legitimate Traffic
GitHub engineers recently traced user reports of unexpected “Too Many Requests” errors to abuse-mitigation rules that had accidentally remained active long after the incidents that prompted them.
-
Airbnb Expands Global Checkout with “Pay as a Local,” Scaling to 220 Markets in 14 Months
Airbnb expands its global checkout with the “Pay as a Local” initiative, supporting over 20 locally preferred payment methods across 220 markets. The company replatformed its payments system with domain-oriented services, reusable flow archetypes, and a centralized configuration, enhancing integration speed, reliability, testing, and observability for diverse payment methods worldwide.