InfoQ Homepage Kubernetes Content on InfoQ

News

RSS Feed

Newer Older

DevOps

Kubernetes 1.35 Released with In-Place Pod Resize and AI-Optimized Scheduling

The Cloud Native Computing Foundation (CNCF) announced the release of Kubernetes 1.35, named "Timbernetes", emphasizing its focus on mutability and the optimization of high-performance AI/ML workloads.

Mostafa Radwan
on Dec 31, 2025
DevOps

AWS Announces New Amazon EKS Capabilities to Simplify Workload Orchestration

Amazon Web Services has launched Amazon EKS Capabilities, a set of fully managed, Kubernetes-native features designed to streamline workload orchestration, AWS cloud resource management, and Kubernetes resource composition and automation.

Craig Risi
on Dec 30, 2025
DevOps

Open-Source Agent Sandbox Enables Secure Deployment of AI Agents on Kubernetes

The Agent Sandbox is an open-source Kubernetes controller that provides a declarative API for managing a single, stateful pod with stable identity and persistent storage. It is particularly well suited for creating isolated environments to execute untrusted, LLM-generated code, as well as for running other stateful workloads.

Sergio De Simone
on Dec 30, 2025
DevOps

CNCF Launches Certified Kubernetes AI Conformance Programme to Standardise Workloads

The CNCF has launched the Certified Kubernetes AI Conformance programme to standardise artificial intelligence workloads. By establishing a technical baseline for GPU management, networking, and gang scheduling, the initiative ensures portability across cloud providers. It aims to reduce technical debt and prevent vendor lock-in as enterprises move generative AI models into production.

Mark Silvester
on Dec 30, 2025
AI, ML & Data Engineering

Neptune Combines AI‑Assisted Infrastructure as Code and Cloud Deployments

Now available in beta, Neptune is a conversational AI agent designed to act like an AI platform engineer, handling the provisioning, wiring, and configuration of the cloud services needed to run a containerized app. Neptune is both language and cloud-agnostic, with support for AWS, GCP, and Azure.

Sergio De Simone
on Dec 22, 2025
Architecture & Design

Lyft Rearchitects ML Platform with Hybrid AWS SageMaker-Kubernetes Approach

Lyft has rearchitected its machine learning platform LyftLearn into a hybrid system, moving offline workloads to AWS SageMaker while retaining Kubernetes for online model serving. Its decision to choose managed services where operational complexity was highest, while maintaining custom infrastructure where control mattered most, offers a pragmatic alternative to unified platform strategies.

Eran Stiller
on Dec 16, 2025
DevOps

Google Cloud Demonstrates Massive Kubernetes Scale with 130,000-Node GKE Cluster

The team behind Google Kubernetes Engine (GKE) revealed that they successfully built and operated a Kubernetes cluster with 130,000 nodes, making it the largest publicly disclosed Kubernetes cluster to date.

Craig Risi
on Dec 10, 2025
DevOps

NVIDIA Dynamo Addresses Multi-Node LLM Inference Challenges

Serving Large Language Models (LLMs) at scale is complex. Modern LLMs now exceed the memory and compute capacity of a single GPU or even a single multi-GPU node. As a result, inference workloads for 70B+, 120B+ parameter models, or pipelines with large context windows, require multi-node, distributed GPU deployments.

Claudio Masolo
on Dec 04, 2025
AI, ML & Data Engineering

How Discord Scaled its ML Platform from Single-GPU Workflows to a Shared Ray Cluster

Discord has detailed how it rebuilt its machine learning platform after hitting the limits of single-GPU training. The changes enabled daily retrains for large models and contributed to a 200% uplift in a key ads ranking metric.

Matt Foster
on Dec 03, 2025
DevOps

Helm Improves Kubernetes Package Management with Biggest Release in 6 Years

Helm, the Kubernetes application package manager, has officially reached version 4.0.0. Helm 4 is the first major upgrade in six years, and also marks Helm's 10th anniversary under the guidance of the Cloud Native Computing Foundation (CNCF). The update aims to address several challenges around scalability, security, and developer workflow.

Matt Saunders
on Nov 30, 2025
DevOps

Kubernetes Community Retires Popular Ingress NGINX Controller

The Kubernetes SIG Network and the Security Response Committee has announced the retirement of Ingress NGINX, one of the most widely deployed ingress controllers in the ecosystem. Best-effort maintenance will continue until March 2026, after which there will be no further releases, bug fixes, or security updates, according to an announcement made at Kubecon NA 2025.

Matt Saunders
on Nov 29, 2025
AI, ML & Data Engineering

KubeCon NA 2025 - Robert Nishihara on Open Source AI Compute with Kubernetes, Ray, PyTorch, and vLLM

AI workloads are growing more complex in terms of compute and data, and technologies like Kubernetes and PyTorch can help build production-ready AI systems to support them. Robert Nishihara from Anyscale recently spoke at KubeCon + CloudNativeCon North America 2025 Conference about how an AI compute stack comprising Kubernetes, PyTorch, VLLM and Ray technologies can support these new AI workloads.

Srini Penchikala
on Nov 28, 2025
Architecture & Design

Airbnb Adds Adaptive Traffic Control to Manage Key Value Store Spikes

Airbnb upgraded Mussel, its multi-tenant key-value store, replacing static per-client rate limits with an adaptive, resource-aware traffic control system. The redesign ensures resilience during traffic spikes, protects critical workflows, and maintains fair usage across thousands of tenants while scaling efficiently.

Leela Kumili
on Nov 21, 2025
AI, ML & Data Engineering

KubeCon NA 2025 - Erica Hughberg and Alexa Griffith on Tools for the Age of GenAI

Generative AI technologies need to support new workloads, traffic patterns, and infrastructure demands and require a new set of tools for the age of GenAI. Erica Hughberg from Tetrate and Alexa Griffith from Bloomberg spoke last week at KubeCon + CloudNativeCon North America 2025 Conference about what it takes to build GenAI platforms capable of serving model inference at scale.

Srini Penchikala
on Nov 17, 2025
DevOps

Crossplane Reaches Production Maturity by Graduating CNCF

The Cloud Native Computing Foundation (CNCF) has graduated Crossplane, marking a major milestone for the open-source project that turns Kubernetes into a universal control plane for cloud infrastructure. For practitioners, it signals that Crossplane is no longer an experimental idea but a production-hardened foundation for building internal platforms.

Matt Foster
on Nov 13, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News