InfoQ Homepage Scaling Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

From Camera to Cloud: Netflix’s Scalable Media Processing Pipeline

Netflix has detailed a cloud-based system for scaling camera file processing across global film and TV workflows. The pipeline handles ingest, validation, metadata extraction, and media transformation at scale using FilmLight API and distributed compute. It standardizes workflows across editorial, VFX, and color pipelines, improving consistency and reducing manual handling across productions.

Leela Kumili
on Jun 18, 2026
Architecture & Design

Pinterest Uses Content Fingerprints for URL Deduplication across Millions of Domains

Pinterest introduced MIQPS, a URL normalization system that identifies which query parameters affect page identity using rendered content fingerprints. It reduces duplicate processing across millions of domains by replacing rule-based approaches with offline analysis, anomaly detection, and runtime parameter maps, improving ingestion efficiency and scalability in large-scale content pipelines.

Leela Kumili
on Jun 08, 2026
Culture & Methods

Scaling Social Systems in Software Organizations

Fast-scaling teams must rebuild trust and psychological safety as their social systems expand. Intentional, redundant communication across multiple formats can keep everyone aligned. Cross-team rituals, buddy systems, and rotating facilitators can reduce silos by building bridges between teams. Leaders accelerate this by modeling the vulnerability they want to see.

Ben Linders
on May 14, 2026
DevOps

Meta Deploys Unified AI Agents to Automate Performance Optimization at Hyperscale

Meta has unveiled a new AI-driven capacity efficiency platform that uses unified AI agents to automatically detect and resolve performance issues across its global infrastructure, marking a significant step toward self-optimizing systems at hyperscale.

Craig Risi
on May 01, 2026
DevOps

Netflix Scales "Human Infrastructure" to Manage Global Live Operations

Netflix has introduced a "human infrastructure" layer to manage live broadcasts at scale. Using a low-latency "telemetry hot path" and a Live Operations Centre, the company now balances automated scaling with human oversight. This shift, which mirrors strategies at AWS and Disney+, focuses on maintaining reliability through expert intervention during high-concurrency global events.

Mark Silvester
on Apr 30, 2026
Architecture & Design

Cloudflare Outlines MCP Architecture as Enterprises Confront Security and Governance Risks

Cloudflare has outlined a reference architecture for scaling Model Context Protocol (MCP) deployments across the enterprise, positioning centralized governance, remote server infrastructure, and cost controls as key requirements for production-ready agent systems.

Matt Foster
on Apr 22, 2026
DevOps

GitHub Acknowledges Recent Outages, Cites Scaling Challenges and Architectural Weaknesses

GitHub has publicly addressed a series of recent availability and performance issues that disrupted services across its platform, attributing the incidents to rapid growth, architectural coupling, and limitations in handling system load.

Craig Risi
on Apr 21, 2026
Culture & Methods

How to Handle Trust and Psychological Safety When Scaling Organizations

As organizations scale, communication overload, loss of shared context, and trust gaps emerge, Charlotte de Jong Schouwenburg mentioned. Trust must be built team by team; it can’t be replicated. Trust is interpersonal, while psychological safety exists among people and fuels learning. Leaders must deliberately design structures, rituals, and metrics that reward transparency and cohesion at scale.

Ben Linders
on Apr 02, 2026
DevOps

QCon London 2026: Shielding the Core: Architecting Resilience with Multi-Layer Defenses

Anderson Parra, staff software engineer at SeatGeek, presented “Shielding the Core: Architecting Resilience with Multi-Layer Defenses” at QCon London 2026. Parra discussed strategies on how to handle significant traffic spikes in systems that can overwhelm an even well-designed infrastructure.

Michael Redlich
on Mar 25, 2026
Architecture & Design

OpenAI Scales Single Primary PostgreSQL Instance to Millions of Queries per Second for ChatGPT

OpenAI described how it scaled PostgreSQL to support ChatGPT and its API platform, handling millions of queries per second for hundreds of millions of users. By running a single-primary PostgreSQL deployment on Azure with nearly 50 read replicas, optimizing query patterns, and offloading write-heavy workloads to sharded systems, OpenAI maintained low-latency reads while managing write pressure.

Leela Kumili
on Feb 12, 2026
DevOps

Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025

At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.

Steef-Jan Wiggers
on Nov 20, 2025
DevOps

Advanced Autoscaling Helps Companies Reduce AWS Costs by 70%

The next generation of Kubernetes autoscaling techniques and tools is enabling organisations to make substantial cost savings in their cloud infrastructure. Svetlana Burninova recently used Karpenter to build a multi-architecture EKS cluster and managed a 70% reduction in cost whilst also improving performance.

Matt Saunders
on Aug 31, 2025
Cloud

Amazon DocumentDB Serverless: Auto-Scaling Database Solution for Variable Workloads

AWS has launched Amazon DocumentDB Serverless, an auto-scaling database solution compatible with MongoDB, tailored for variable workloads. While marketed as "serverless," it functions more like auto-scaling, charging from $30/month. Ideal for enterprises and SaaS vendors, it adeptly handles spikes in demand, particularly for AI-driven applications.

Steef-Jan Wiggers
on Aug 07, 2025
Culture & Methods

Inflection Points in Engineering Productivity for Improving Productivity and Operational Excellence

As companies grow, investing in custom developer tools may become necessary. Initially, standard tools suffice, but as companies scale in engineers, maturity, and complexity, industry tools may no longer meet needs. Inflection points, such as a crisis, hyper-growth, or reaching a new market, often trigger investments, providing opportunities for improving productivity and operational excellence.

Ben Linders
on Apr 24, 2025
Culture & Methods

Lessons Learned from Growing an Engineering Organization

As their organization grew, Thiago Ghisi's work as director of engineering shifted from being hands-on in emergencies to designing frameworks and delegating decisions. He suggested treating changes as experiments, documenting reorganizations, and using a wave-based communication approach to gather feedback, ensuring people feel heard and invested.

Ben Linders
on Apr 09, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News