InfoQ Homepage Architecture Content on InfoQ
-
AWS Expands Well-Architected Framework with Responsible AI and Updated ML and Generative AI Lenses
At AWS re:Invent 2025, AWS expanded its Well-Architected Framework with a new Responsible AI Lens and updated Machine Learning and Generative AI Lenses. The updates provide guidance on governance, bias mitigation, scalable ML workflows, and trustworthy AI system design across the full AI lifecycle.
-
InfoQ Announces January Online Architect Cohort Focused on Socio-Technical Leadership
InfoQ announces the January 2026 intake for its Certified Architect Program. Facilitated by Luca Mezzalira, this 5-week online cohort focuses on socio-technical leadership, helping senior architects bridge the gap between technical design and organizational influence. Participants engage in weekly applied learning and peer collaboration to earn the ICSAET certification.
-
Lyft Rearchitects ML Platform with Hybrid AWS SageMaker-Kubernetes Approach
Lyft has rearchitected its machine learning platform LyftLearn into a hybrid system, moving offline workloads to AWS SageMaker while retaining Kubernetes for online model serving. Its decision to choose managed services where operational complexity was highest, while maintaining custom infrastructure where control mattered most, offers a pragmatic alternative to unified platform strategies.
-
Breaking Silos: Netflix Introduces Upper Metamodel to Bring Consistency across Content Engineering
Netflix has introduced the Upper metamodel within its Unified Data Architecture (UDA) to standardize domain definitions and generate consistent data container representations. UDA links conceptual models to GraphQL, Avro, SQL, and Java artifacts, supporting projections, mappings, and knowledge graph-based discovery across content, advertising, and operational systems.
-
From On-Demand to Live : Netflix Streaming to 100 Million Devices in under 1 Minute
Netflix’s global live streaming platform powers millions of viewers with cloud-based ingest, custom live origin, Open Connect delivery, and real-time recommendations. This article explores the architecture, low-latency pipelines, adaptive bitrate streaming, and operational monitoring that ensure reliable, scalable, and synchronized live event experiences worldwide.
-
Airbnb Adds Adaptive Traffic Control to Manage Key Value Store Spikes
Airbnb upgraded Mussel, its multi-tenant key-value store, replacing static per-client rate limits with an adaptive, resource-aware traffic control system. The redesign ensures resilience during traffic spikes, protects critical workflows, and maintains fair usage across thousands of tenants while scaling efficiently.
-
Netflix Tackles Data Deletion at Scale with Centralized Platform Architecture
Netflix engineers presented their architecture for a centralized data-deletion platform at QCon San Francisco, addressing a critical yet rarely discussed system design challenge. The platform manages deletion across heterogeneous data stores while balancing durability, availability, and correctness, processing 76.8 billion row deletions across 1,300 datasets with zero data loss incidents.
-
AWS Launches Capabilities by Region Tool
AWS has launched "AWS Capabilities by Region," a powerful tool that streamlines service visibility for architects and developers. No more manual checks—now you can compare AWS services across regions interactively and plan deployments efficiently. With enhanced transparency and automated capability checks, streamline global projects and minimize delays.
-
Meta Open Sources OpenZL: a Universal Compression Framework for Structured Data
Meta’s OpenZL changes the way data is compressed by maximizing efficiency for structured datasets, outperforming traditional methods like Zstandard. With a universal decompressor and custom compression plans, it simplifies operational deployment while achieving superior compression ratios and speeds, making it an essential tool for modern data infrastructures.
-
Slack Security: inside the New Anomaly Event Response Architecture
Slack has launched Anomaly Event Response (AER), a real-time security system that autonomously detects suspicious activity, terminates risky sessions, and reduces response time from days to minutes. The system’s architecture includes a detection engine, decision framework, and response orchestrator to help organizations prevent breaches efficiently.
-
10 AI-Related Standout Sessions at QCon San Francisco 2025
Join us at QCon San Francisco 2025 (Nov 17–21) for a three-day deep dive into the future of software development, exploring AI’s transformative impact. As a program committee member, I’m excited to showcase tracks that tackle real-world challenges, featuring industry leaders and sessions on AI, LLMs, and engineering mindsets. Don’t miss out!
-
11 Sessions Not to Miss at QCon San Francisco 2025
As QCon San Francisco (Nov 17-21, 2025) approaches, the conference's program committee and track hosts are sharing their top picks from this year's lineup. Their selections span a wide range of topics, from AI-accelerated development and platform engineering to resilience patterns and career growth, all with QCon's signature focus on real-world case studies and lessons learned.
-
Producing a Better Software Architecture with Residuality Theory
Software architecture is tough because it blends coding, math, and business systems. Due to surprises, architectures tend to become irrelevant over time, Barry O'Reilly said. He presented residuality theory, where he suggested stressing naive architectures to reveal hidden “attractors” in complex business systems. This allows designs to better survive change and uncertainty.
-
How Netflix Powers Audience Insights at Trillion-Row Scale
In a recent blog post, Netflix engineers described how they scaled Muse, the company’s internal application for data-driven creative insights, to handle trillion-row datasets.
-
Datadog Launches Monocle, a Unified Rust-Powered Real-Time Metrics Engine
Datadog has launched Monocle, a new real-time time series storage engine written in Rust. The system unifies the company’s metrics storage infrastructure, delivering higher ingestion throughput and lower query latency while reducing operational complexity. Monocle replaces several generations of storage backends, addressing concurrency challenges and scaling limits that accumulated over time.