InfoQ Homepage Kubernetes Content on InfoQ
-
Pinterest's Moka: How Kubernetes Is Rewriting the Rules of Big Data Processing
Digital pinboard provider Pinterest has published an article explaining its blueprint for the future of large-scale data processing with its new platform Moka. The company is moving core workloads from ageing Hadoop infrastructure to a Kubernetes-based system on Amazon EKS, with Apache Spark as the main engine and support for other frameworks on the way.
-
Docker Kanvas Challenges Helm and Kustomize for Kubernetes Dominance
Docker has launched Kanvas, a new platform designed to bridge the gap between local development and cloud production. By automating the conversion of Docker Compose files into Kubernetes artefacts, the tool challenges established solutions like Helm and Kustomize. Developed with Layer5, it marks a shift toward Infrastructure as Code, offering visualisations to simplify cloud-native deployments.
-
Kubernetes 1.35 Released with In-Place Pod Resize and AI-Optimized Scheduling
The Cloud Native Computing Foundation (CNCF) announced the release of Kubernetes 1.35, named "Timbernetes", emphasizing its focus on mutability and the optimization of high-performance AI/ML workloads.
-
AWS Announces New Amazon EKS Capabilities to Simplify Workload Orchestration
Amazon Web Services has launched Amazon EKS Capabilities, a set of fully managed, Kubernetes-native features designed to streamline workload orchestration, AWS cloud resource management, and Kubernetes resource composition and automation.
-
Open-Source Agent Sandbox Enables Secure Deployment of AI Agents on Kubernetes
The Agent Sandbox is an open-source Kubernetes controller that provides a declarative API for managing a single, stateful pod with stable identity and persistent storage. It is particularly well suited for creating isolated environments to execute untrusted, LLM-generated code, as well as for running other stateful workloads.
-
CNCF Launches Certified Kubernetes AI Conformance Program to Standardise Workloads
The CNCF has launched the Certified Kubernetes AI Conformance program to standardise artificial intelligence workloads. By establishing a technical baseline for GPU management, networking, and gang scheduling, the initiative ensures portability across cloud providers. It aims to reduce technical debt and prevent vendor lock-in as enterprises move generative AI models into production.
-
Neptune Combines AI‑Assisted Infrastructure as Code and Cloud Deployments
Now available in beta, Neptune is a conversational AI agent designed to act like an AI platform engineer, handling the provisioning, wiring, and configuration of the cloud services needed to run a containerized app. Neptune is both language and cloud-agnostic, with support for AWS, GCP, and Azure.
-
Lyft Rearchitects ML Platform with Hybrid AWS SageMaker-Kubernetes Approach
Lyft has rearchitected its machine learning platform LyftLearn into a hybrid system, moving offline workloads to AWS SageMaker while retaining Kubernetes for online model serving. Its decision to choose managed services where operational complexity was highest, while maintaining custom infrastructure where control mattered most, offers a pragmatic alternative to unified platform strategies.
-
Google Cloud Demonstrates Massive Kubernetes Scale with 130,000-Node GKE Cluster
The team behind Google Kubernetes Engine (GKE) revealed that they successfully built and operated a Kubernetes cluster with 130,000 nodes, making it the largest publicly disclosed Kubernetes cluster to date.
-
NVIDIA Dynamo Addresses Multi-Node LLM Inference Challenges
Serving Large Language Models (LLMs) at scale is complex. Modern LLMs now exceed the memory and compute capacity of a single GPU or even a single multi-GPU node. As a result, inference workloads for 70B+, 120B+ parameter models, or pipelines with large context windows, require multi-node, distributed GPU deployments.
-
How Discord Scaled its ML Platform from Single-GPU Workflows to a Shared Ray Cluster
Discord has detailed how it rebuilt its machine learning platform after hitting the limits of single-GPU training. The changes enabled daily retrains for large models and contributed to a 200% uplift in a key ads ranking metric.
-
Helm Improves Kubernetes Package Management with Biggest Release in 6 Years
Helm, the Kubernetes application package manager, has officially reached version 4.0.0. Helm 4 is the first major upgrade in six years, and also marks Helm's 10th anniversary under the guidance of the Cloud Native Computing Foundation (CNCF). The update aims to address several challenges around scalability, security, and developer workflow.
-
Kubernetes Community Retires Popular Ingress NGINX Controller
The Kubernetes SIG Network and the Security Response Committee has announced the retirement of Ingress NGINX, one of the most widely deployed ingress controllers in the ecosystem. Best-effort maintenance will continue until March 2026, after which there will be no further releases, bug fixes, or security updates, according to an announcement made at Kubecon NA 2025.
-
KubeCon NA 2025 - Robert Nishihara on Open Source AI Compute with Kubernetes, Ray, PyTorch, and vLLM
AI workloads are growing more complex in terms of compute and data, and technologies like Kubernetes and PyTorch can help build production-ready AI systems to support them. Robert Nishihara from Anyscale recently spoke at KubeCon + CloudNativeCon North America 2025 Conference about how an AI compute stack comprising Kubernetes, PyTorch, VLLM and Ray technologies can support these new AI workloads.
-
Airbnb Adds Adaptive Traffic Control to Manage Key Value Store Spikes
Airbnb upgraded Mussel, its multi-tenant key-value store, replacing static per-client rate limits with an adaptive, resource-aware traffic control system. The redesign ensures resilience during traffic spikes, protects critical workflows, and maintains fair usage across thousands of tenants while scaling efficiently.