BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Kubernetes v1.36: Security Defaults Tighten as AI Workload Support Matures

Kubernetes v1.36: Security Defaults Tighten as AI Workload Support Matures

Listen to this article -  0:00

The Kubernetes project has released version 1.36, named Haru, marking the first major Kubernetes release of 2026. The release contains 70 enhancements: 18 graduating to Stable, 25 entering Beta, and 25 new Alpha features, with a strong emphasis on security hardening, AI and machine learning workloads, and API scalability at scale. The release blog, authored by editors Chad M. Crowell, Kirti Goyal, Sophia Ugochukwu, Swathi Rao, and Utkarsh Umre, describes the release as arriving "as the season turns and the light shifts on the mountain", with contributions from 106 companies and 491 individuals.

The most prominent security graduation in this release is User Namespaces reaching General Availability, a feature that has been in development across multiple release cycles. The feature maps a container's root user to a non-privileged user on the host, so that a process escaping a container does not gain administrative access to the underlying node. Also graduating to GA are Mutating Admission Policies, which allow teams to define mutation logic as a native Kubernetes object using the Common Expression Language (CEL), removing the requirement to maintain a separate webhook server. The release blog notes that this "provides a native, high-performance alternative to traditional webhooks" and reduces "the latency and operational complexity associated with managing custom admission webhooks". This is documented and pictured in a blog from Kloia.

Kubernetes mutating webhook obsolescence, (C) Kloia

Fine-Grained Kubelet API Authorization also reaches GA in this release. First introduced as an alpha in v1.32, the feature enables more precise, least-privilege access control over the kubelet's HTTPS API, replacing the overly broad nodes/proxy permission that monitoring and observability tooling has traditionally required. SELinux Volume Labeling reaches stable as well, replacing recursive file relabeling with a mount -o context=XYZ option that applies the correct SELinux label to an entire volume at mount time, reducing pod startup delays on SELinux-enforcing systems. Declarative validation using validation-gen and Volume Group Snapshots, which allow crash-consistent snapshots across multiple PersistentVolumeClaims simultaneously, both complete their journey to GA in this release.

DRA admin access and prioritized lists for Dynamic Resource Allocation also reach GA, providing a permanent framework for cluster administrators to access and manage hardware resources globally and ensuring that resource selection logic remains consistent across cluster environments.

The AI and machine learning story in v1.36 is largely one of defaults catching up to accumulated workload requirements. Writing for ScaleOps, the team describes the release as "less about brand-new mechanics and more about the defaults catching up to two years of accumulated AI workload scar tissue". Multiple DRA enhancements reach Beta and ship enabled by default in this release: DRA Partitionable Devices, DRA Consumable Capacity, and DRA Device Taints and Tolerations all flip on without requiring explicit feature gate configuration. Together these replace the integer-GPU device plugin model, where a single card was allocated wholesale regardless of actual utilisation, with primitives that can express how modern accelerators are partitioned, shared, and recovered when they fail. The VMware Cloud Foundation blog also notes that previously "requesting complex resources often required opaque, vendor-specific blobs that were difficult for the scheduler to optimise", and that the structured approach in v1.36 reduces the complexity of multi-node AI deployments.

The headline new alpha feature for AI workloads is Workload-Aware Preemption. Prior to this change, the scheduler would preempt individual pods when making room for higher-priority workloads, which could leave a distributed training job with seven of eight ranks running but unable to make progress. The new behaviour treats a PodGroup as a single preemption unit and only proceeds with eviction after verifying that the high-priority group can actually fit. As the Palark team describe in their release coverage, this addresses a "partial preemption failure mode for distributed training" that has been a persistent pain point for teams running large GPU jobs. The Gang Scheduling API itself, first introduced in alpha in v1.35, moves to Beta in v1.36.

Mutable Pod Resources for Suspended Jobs also moves to Beta and is enabled by default. The feature allows a queue controller to suspend a running job, adjust its CPU, memory, GPU, or extended resource requests to match available cluster capacity, and then unsuspend it without destroying and recreating pods. The Kloia team note that this removes the need for custom controllers or killing and restarting jobs entirely, making it practical for workload queue systems to act on real-time cluster conditions.

On API scalability, v1.36 introduces sharded list and watch streams as a new alpha feature. Large clusters with many controllers can encounter watch stream bottlenecks as all watchers receive updates through a single connection per resource type. The sharded approach distributes this load across multiple streams, which the Palark team identify as addressing "a key pain point for very large deployments where watch streams can become bottlenecks".

Memory QoS via cgroup v2 moves to Beta in this release, offering tiered memory protection that better aligns kernel controls with pod requests and limits to reduce contention between workloads sharing a node. In-Place Vertical Scaling for Pod-Level Resources also moves to Beta and is enabled by default, allowing the pod scope CPU and memory envelope to be resized without a container restart. A new ResizeDeferred event type is introduced so that when a resize cannot be applied immediately due to insufficient node capacity, the pod continues running at its existing size while the kubelet retries the resize once capacity becomes available.

Teams planning upgrades should be aware of several removals in this release. The gitRepo volume plugin is permanently removed after being deprecated since v1.11; it allowed attackers to run code as root on a node and the PerfectScale team advise migrating to init containers or external git-sync tooling before upgrading. IPVS mode in kube-proxy, deprecated in v1.35, is also removed. Additionally, flex-volume support in kubeadm and the Portworx in-tree driver are removed in this release, as the Kloia team note in their upgrade guidance.

A significant operational change that predates this release but is highlighted in the v1.36 release blog is the retirement of Ingress NGINX. Kubernetes SIG Network and the Security Response Committee retired the project on 24th March 2026. Since that date there have been no further releases, bugfixes, or security vulnerability patches. InfoQ covered the evolution of Kubernetes networking in its Kubernetes 1.35 release coverage, which noted that Ingress NGINX would "receive only best-effort maintenance until March 2026".

The VMware Cloud Foundation blog contextualises the release as part of a larger shift: "Kubernetes is moving from a flexible framework toward more opinionated defaults for security and resource standards." The same post observes that "keeping up with Kubernetes is no longer just about upgrading clusters", but involves "managing lifecycle complexity, deciding when to adopt new versions, understanding how changes impact existing workloads, and avoiding disruption as the platform evolves."

"By moving toward a more structured approach, Kubernetes is making it easier for the scheduler to understand the specific requirements of a GPU or AI accelerator, drastically reducing the complexity of multi-node AI deployments."
VMware Cloud Foundation Blog, Kubernetes 1.36: What Actually Changed for Enterprise Platforms

The full release notes for Kubernetes 1.36 are also available.

 

About the Author

Rate this Article

Adoption
Style

BT