InfoQ Homepage DevOps Content on InfoQ
-
Benchmarking beyond the Application Layer: How Uber Evaluates Infrastructure Changes and Cloud Skus
Uber’s Ceilometer framework automates infrastructure performance benchmarking beyond applications. It standardizes testing across servers, workloads, and cloud SKUs, helping teams validate changes, identify regressions, and optimize resources. Future plans include AI integration, anomaly detection, and continuous validation.
-
AWS and Google Cloud Preview Secure Multicloud Networking
In a surprising move, AWS and Google Cloud have recently partnered to simplify multicloud networking, introducing a common standard and leveraging "AWS Interconnect - Multicloud" and "Google Cloud's Cross-Cloud Interconnect". The new option makes it easier for organizations to manage and secure workloads across both clouds, with Azure expected to join in 2026.
-
Python Workers Redux: Wasm Snapshots and Native uv Tooling
Cloudflare's latest advancements in Python Workers revolutionize serverless performance with near-instant cold starts, expanded package compatibility, and streamlined workflows via the uv package manager. By leveraging memory snapshots and WebAssembly, Cloudflare drastically reduces startup times, making Python a prime choice for AI and data science applications.
-
Pinecone Introduces Dedicated Read Nodes in Public Preview for Predictable Vector Workloads
Pinecone recently announced the public preview of Dedicated Read Nodes (DRN), a new capacity mode for its vector database designed to deliver predictable performance and cost at scale for high-throughput applications such as billion-vector semantic search, recommendation systems, and mission-critical AI services.
-
Neptune Combines AI‑Assisted Infrastructure as Code and Cloud Deployments
Now available in beta, Neptune is a conversational AI agent designed to act like an AI platform engineer, handling the provisioning, wiring, and configuration of the cloud services needed to run a containerized app. Neptune is both language and cloud-agnostic, with support for AWS, GCP, and Azure.
-
AWS Launches ECS Express Mode to Simplify Containerised Application Deployment
AWS has released Amazon ECS Express Mode, bringing a simplified process to deploying containerised web applications and APIs. Express Mode lets users deploy production-ready services in one shot, bypassing the usual detail required around ancillary requirements such as IAM roles, load-balancers and scaling.
-
AWS Introduces Regional Availability for NAT Gateway
AWS has recently introduced regional availability for the managed NAT Gateway service. The new capability allows developers to create a single NAT Gateway that automatically spans multiple availability zones (AZs) in a VPC, providing high availability, eliminating the need to define separate gateways and public subnets in each zone.
-
Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs
Decathlon, one of the world's leading sports retailers, recently shared why it adopted the open source library Polars to optimize its data pipelines. The Decathlon Digital team found that migrating from Apache Spark to Polars for small input datasets provides significant speed and cost savings.
-
Pinterest Engineering Reduces Android CI Build Times by 36% with Runtime-Aware Sharding
Pinterest published a technical case study detailing how its engineering team cut Android end-to-end (E2E) continuous integration (CI) build times by more than 36 percent by adopting a runtime-aware test-sharding strategy and building an internal testing platform.
-
Google Cloud Launches Managed MCP Support
Google Cloud's introduction of fully-managed Model Context Protocol (MCP) servers revolutionizes its API infrastructure, streamlining access for developers. This enterprise-ready solution enhances AI integration across services such as Google Maps and BigQuery while promoting wide-scale adoption. New tools ensure governance and security, and are currently in public preview.
-
AWS Debuts “DevOps Agent” to Automate Incident Response and Improve System Reliability
AWS recently announced the public preview of AWS DevOps Agent, a new "frontier agent" that aims to help organizations react more quickly to production incidents, identify root causes, and proactively strengthen system reliability.
-
Netflix Migrates to Amazon Aurora: 75% Performance Boost and 28% Cost Reduction
Netflix consolidated its relational databases onto Amazon Aurora, cutting costs by 28% and boosting performance by up to 75%. The move from self-managed PostgreSQL reduced operational toil, improving latency for critical apps. This mirrors migrations by Samsung and Panasonic, though benchmarks suggest alternatives like Timescale may suit specific workloads better.
-
AWS Transform Custom Tackles Technical Debt
AWS Transform Custom revolutionizes code modernization with AI-driven, out-of-the-box transformations for Java, Node.js, and Python. This enterprise-focused tool accelerates application upgrades by up to 5x while learning from organizational nuances to deliver high-quality, repeatable transformations.
-
Yelp Publishes Blueprint for Managing S3 Server-Access Logs at Massive Scale
In a detailed engineering post, Yelp shared how it built a scalable and cost-efficient pipeline for processing Amazon S3 server-access logs (SAL) across its infrastructure, overcoming traditional limitations of raw log storage and querying at high volume.
-
Google Cloud Demonstrates Massive Kubernetes Scale with 130,000-Node GKE Cluster
The team behind Google Kubernetes Engine (GKE) revealed that they successfully built and operated a Kubernetes cluster with 130,000 nodes, making it the largest publicly disclosed Kubernetes cluster to date.