InfoQ Homepage Orchestration Content on InfoQ
-
KubeCon NA 2025 - Robert Nishihara on Open Source AI Compute with Kubernetes, Ray, PyTorch, and vLLM
AI workloads are growing more complex in terms of compute and data, and technologies like Kubernetes and PyTorch can help build production-ready AI systems to support them. Robert Nishihara from Anyscale recently spoke at KubeCon + CloudNativeCon North America 2025 Conference about how an AI compute stack comprising Kubernetes, PyTorch, VLLM and Ray technologies can support these new AI workloads.
-
The Architectural Shift: AI Agents Become Execution Engines While Backends Retreat to Governance
A fundamental shift in enterprise software architecture is emerging as AI agents transition from assistive tools to operational execution engines, with traditional application backends retreating to governance and permission management roles. This transformation is accelerating across sectors, with 40% of enterprise applications expected to include autonomous agents by 2026.
-
Slack Security: inside the New Anomaly Event Response Architecture
Slack has launched Anomaly Event Response (AER), a real-time security system that autonomously detects suspicious activity, terminates risky sessions, and reduces response time from days to minutes. The system’s architecture includes a detection engine, decision framework, and response orchestrator to help organizations prevent breaches efficiently.
-
CNCF Incubates OpenYurt for Kubernetes at the Edge
OpenYurt, a project incubated by the Cloud Native Computing Foundation (CNCF), extends Kubernetes to edge computing, enhancing performance in locations like IoT sites and branch offices. With a growing community and support from industry leaders, OpenYurt focuses on efficient cluster management while maintaining Kubernetes compatibility, ensuring lower latency and improved reliability.
-
Local Development with Workflow Studio for Step Functions
AWS has enhanced its Workflow Studio for Step Functions, now integrated into Visual Studio Code via the AWS Toolkit. This allows developers to create and edit state machines locally with intuitive visual tools. Key features include Design and Code modes, localized testing capabilities, and support for ASL definitions, streamlining the development of distributed applications and workflows.
-
Netflix Enhances Metaflow with New Configuration Capabilities
Netflix has introduced a significant enhancement to its Metaflow machine learning infrastructure: a new Config object that brings powerful configuration management to ML workflows. This addition addresses a common challenge faced by Netflix's teams, which manage thousands of unique Metaflow flows across diverse ML and AI use cases.
-
QCon London: Mastering Long-Running Processes in Modern Architectures
At QCon London 2024, Bernd Ruecker recommended implementing long-running tasks asynchronously with a process-orchestration platform. Such a platform provides better service boundaries and efficiencies and reduces accidental system complexity and risk. Organizing the platform centrally in an organization eases orchestration adoption by applications.
-
Netflix Uses Metaflow to Manage Hundreds of AI/ML Applications at Scale
Netflix recently published how its Machine Learning Platform (MLP) team provides an ecosystem around Metaflow, an open-source machine learning infrastructure framework. By creating various integrations for Metaflow, Netflix already has hundreds of Metaflow projects maintained by multiple engineering teams.
-
Canonical Launches Charmed MLFlow to Simplify Management and Maintenance of ML Workflows
Based on the open-source MLflow platform, Canonical Charmed MLFlow aims to simplify the task of managing machine learning workflows and artifacts by using alternative packaging system and orchestration engine.
-
Azure Durable Functions Now Supports Storage Backends Microsoft Netherite and MSSQL
Microsoft recently announced that Azure Durable Functions support for the new storage providers, Netherite and Microsoft SQL Server (MSSQL), is generally available.
-
AWS Introduces Step Functions Distributed Map for Large-Scale Parallel Data Processing
AWS recently announced a distributed map for Step Functions, a solution for large-scale parallel data processing. Optimized for S3, the new feature of the AWS orchestration service targets interactive and highly parallel serverless data processing workflows.
-
Kestra: a Scalable Open-Source Orchestration and Scheduling Platform
Kestra, a new open-source orchestration and scheduling platform, helps developers to build, run, schedule, and monitor complex pipelines. The concept of a workflow, called Flow in Kestra, is at the heart of the platform. It is a list of tasks defined with a descriptive language based on yaml.
-
AWS Releases Multi-Cloud Kubernetes Autoscaler Karpenter
AWS recently released Karpenter, their open-source Kubernetes cluster autoscaler. This improves upon their Kubernetes Cluster Autoscaler by providing a easily configurable, fully automated scheduler. Karpenter is able to monitor for unscheduled pods and launch new nodes as well as terminate unneeded infrastructure. Karpenter is designed to work with any Kubernetes cluster in any environment.
-
Karmada 0.7: Next-Gen Multi-Cloud and Multi-Cluster Kubernetes Orchestration
Karmada (Kubernetes Armada) 0.7, featuring a promising Kubernetes management system in the hybrid cloud era, became available on July 12, 2021. It brought multi-cluster service discovery, precise cluster status management, replica scheduling based on cluster resources, and more convenient APIs to divide replicas by weight list.
-
Gremlin Aims to Reduce Kubernetes Noisy Neighbours through Chaos Engineering
Gremlin has released enhancements to its Chaos Engineering platform aimed at DevOps engineers interested in future-proofing Kubernetes clusters by isolating "noisy neighbours". On Kubernetes, the noisy neighbour issue occurs when multiple applications sharing a Kubernetes cluster compete for resources leading to degraded performance.