InfoQ Homepage Infrastructure Content on InfoQ

News

RSS Feed

Newer Older

Cloud

Cycle Introduces EU Control Plane as Sovereignty Debate Continues

Cycle recently introduced a separate EU-based control plane, allowing European customers to keep platform management data and telemetry within Europe. The new offering is designed to improve compliance, operational isolation, and responsiveness for European organizations.

Renato Losio
on Jul 04, 2026
Culture & Methods

Shifting Platform Development from Projects to Products

A company shifted from project- to product-thinking after their platform outgrew single-team use. The limitations that they felt with their platform were one-off deliveries, lack of product vision, and weak feedback loops. They have moved toward a self-service, API-driven, multi-tenant infrastructure with clearer ownership and better abstractions.

Ben Linders
on Jul 02, 2026
Cloud

AWS Replaces Fat-Tree Data Center Networks with Random Graph Theory, Cutting Routers by 69%

AWS disclosed that Resilient Network Graphs, a flat network architecture based on quasi-random graph theory, is now the default for most new data center builds. The design replaces fat-tree hierarchies with direct ToR-to-ToR mesh connections using passive optical ShuffleBoxes, cutting routers by 69%, boosting throughput by 33%, and reducing network power consumption by 40%.

Steef-Jan Wiggers
on Jun 04, 2026
Architecture & Design

Inside Google’s System for Coordinated A/B Testing across its Global Service Fleet

Google has shared details of its fleet wide large scale A/B experimentation system designed to standardize experiment assignment, exposure logging, and configuration propagation across distributed services. The approach enables consistent measurement across products, reduces experiment conflicts, and improves reliability of data driven decision making at scale.

Leela Kumili
on Jun 03, 2026
DevOps

AI-Assisted Migration Tool Helps Teams Move from ingress-nginx to Higress in Minutes

The Cloud Native Computing Foundation has highlighted a new AI-assisted migration approach that enabled engineers to migrate 60 ingress-nginx resources to Higress in roughly 30 minutes, demonstrating how artificial intelligence is increasingly being applied to modernize Kubernetes networking and gateway infrastructure.

Craig Risi
on May 29, 2026
Architecture & Design

Swiggy Improves Search Autocomplete Using Real Time Machine Learning Ranking

Swiggy detailed real-time machine-learning ranking system for autocomplete built on OpenSearch. The architecture separates candidate generation and ranking, uses feature stores for real time signals, and applies learning to rank models for improved relevance. It replaces heuristic ranking while maintaining strict latency constraints and enabling continuous model updates from user behavior signals.

Leela Kumili
on May 18, 2026
Cloud

Cloudflare Optimizes Edge Stack for High-Core CPUs instead of Large Cache

Cloudflare recently introduced its Gen 13 servers, marking a shift in how its network handles traffic. Instead of relying on large CPU caches for speed, the company redesigned its software to leverage many more processor cores working in parallel in its latest AMD-based servers.

Renato Losio
on Apr 25, 2026
Architecture & Design

Dropbox Collaborates with GitHub to Reduce Monorepo Size from 87GB to 20GB

Dropbox reduced its backend monorepo from 87GB to 20GB by optimizing Git delta compression in collaboration with GitHub. The changes improved clone times, CI performance, and developer velocity, highlighting how repository storage inefficiencies can impact large-scale engineering workflows.

Leela Kumili
on Apr 22, 2026
Architecture & Design

From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture

Uber redesigned its MySQL fleet using a consensus-driven architecture based on MySQL Group Replication, reducing cluster failover time from minutes to seconds. By moving leader election and failure detection into the database layer, Uber improved availability, simplified external orchestration, and strengthened consistency across thousands of production clusters.

Leela Kumili
on Mar 11, 2026
Development

How CNAME Ordering in RFC Specs Caused Cloudflare 1.1.1.1 Outage

In a recent article titled "What came first- the CNAME or the A record?" Cloudflare explains how an unclear RFC specification caused the popular Cloudflare’s 1.1.1.1 service to break. After identifying the breakage and the ambiguity in older DNS standards regarding record order, Cloudflare proposes a clarified specification.

Renato Losio
on Feb 07, 2026
Architecture & Design

GitHub Reworks Layered Defenses after Legacy Protections Block Legitimate Traffic

GitHub engineers recently traced user reports of unexpected “Too Many Requests” errors to abuse-mitigation rules that had accidentally remained active long after the incidents that prompted them.

Matt Foster
on Feb 04, 2026
DevOps

Cloudflare Scales Infrastructure as Code with Shift-Left Security Practices

Cloudflare has eliminated manual configuration errors across hundreds of production accounts by implementing Infrastructure as Code with automated policy enforcement, processing approximately 30 merge requests daily while catching security violations before deployment rather than after incidents occur.

Claudio Masolo
on Jan 12, 2026
Architecture & Design

Benchmarking beyond the Application Layer: How Uber Evaluates Infrastructure Changes and Cloud Skus

Uber’s Ceilometer framework automates infrastructure performance benchmarking beyond applications. It standardizes testing across servers, workloads, and cloud SKUs, helping teams validate changes, identify regressions, and optimize resources. Future plans include AI integration, anomaly detection, and continuous validation.

Leela Kumili
on Dec 26, 2025
DevOps

NVIDIA Dynamo Addresses Multi-Node LLM Inference Challenges

Serving Large Language Models (LLMs) at scale is complex. Modern LLMs now exceed the memory and compute capacity of a single GPU or even a single multi-GPU node. As a result, inference workloads for 70B+, 120B+ parameter models, or pipelines with large context windows, require multi-node, distributed GPU deployments.

Claudio Masolo
on Dec 04, 2025
Cloud

Azure API Management Premium v2 GA: Simplified Private Networking and VNet Injection

Microsoft has launched API Management Premium v2, redefining security and ease-of-use in cloud API gateways. This new architecture enhances private networking by eliminating management traffic from customer VNets. With features like Inbound Private Link, availability zone support, and custom CA certificates, users gain unmatched networking flexibility, resilience, and significant cost savings.

Steef-Jan Wiggers
on Dec 03, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News