Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Ocado Technology Releases “Kubermesh”, a Prototype Self-Provisioning Mesh Network Kubernetes Cluster

Ocado Technology Releases “Kubermesh”, a Prototype Self-Provisioning Mesh Network Kubernetes Cluster

The Ocado Technology team has created Kubermesh, a prototype "bare-metal, self-hosted, self-healing, self-provisioning, partial-mesh network Kubernetes cluster". Kubermesh can essentially be seen as the glue joining several technologies together to form a container deployment platform consisting of an OSPF3 partial-mesh network and an iPXE-booted self-installing Kubernetes deployment that is running on bare metal servers, and has the potential to run on OpenStack-managed virtual machines.

According to the Ocado Technology blog, the seed for this project was first planted when a question was posed while brainstorming ideas around a whiteboard: "What if we could break up our Customer Fulfilment Centre's data centre into smaller nodes?" The Customer Fulfilment Centres (CFC) are highly automated warehouses where Ocado's groceries are stored, picked, and sent off for delivery to Ocado customers. The proprietary Ocado Smart Platform CFC includes thousands of robots roaming on top of a grid, and workers packing customer orders from boxes of stock delivered by the robots in a just-in-time fashion.

Ocado Kubermesh

Kubermesh deploys an underlay partial-mesh network using OSPF3 on IPv6 (using link-local address auto-configuration), provided by the Quagga Software Routing Suite. A custom overlay virtual network for Kubernetes is utilised in combination with CoreOS’s flannel for IP allocation, as none of the standard providers currently support an IPv4 over IPv6 underlay. Nodes can be added to the mesh by iPXE-booting over IPV6, where CoreOS's matchbox service is used to install the Container Linux operating system, and further provisioning of the base software install is specified in the bootcfg initialisation file. The Kubernetes incubator project bootkube is then used for launching a self-hosted Kubernetes cluster. When launched, bootkube will act as a temporary Kubernetes control-plane (api-server, scheduler, controller-manager), which operates long enough to bootstrap a replacement self-hosted control-plane.

The Ocado Technology blog states that the mesh network could allow developers to potentially remove the data centres, the network, and other machines running around the warehouse, leaving only the computing nodes and fibre optics remaining. The self-provisioning element offers flexibility, allowing additional nodes to simply be connected to the network, and within a few minutes the new resource will be incorporated into the cluster. Kubernetes itself will manage the orchestration of containers, and also handle re-scheduling in the event of an individual node failing.

Kubernetes was initially introduced at Ocado Technology as a container management system for the Code for Life project. Code for Life is a non-profit initiative that delivers free, open-source games designed to help teachers deliver the new computing curriculum and introduce children to coding. The Rapid Router game, designed for primary schools, has more than 90,000 users internationally and that number is growing. Considering the large user base of Code for Life, a system needed to be developed capable of processing and managing large sets of data continuously in order for the game to run smoothly.

The Code for Life website and Rapid Router were already hosted on the Google Cloud Platform, which supports Kubernetes. This meant Kubernetes was "the obvious choice" when looking for a container management system to run many students' code in a cluster deployed onto virtual machines on the cloud. Mike Bryant, IT team leader at Ocado, was involved in implementing this system for Code for Life and realised the potential it held for being used outside of a cloud platform for the specific purpose of streamlining the data system at Ocado Technology.

Our largest grocery CFC in Erith spans 563,000 sq ft and would require 400 of these nodes randomly dotted around the warehouse and wired together to create the mesh. The apps deployed on the nodes could then be strategically placed near other apps they would often communicate with for optimal speed and performance. A node could be any computing equipment typically found in our warehouse, ranging from dedicated servers or Intel NUCs to workstations in pick aisles or PCs used to display engineering-related information on overhead displays.

If the prototype Kubermesh system is proven, it would potentially eradicate the need for the data centres and the network routers entirely, considerably downscaling the existing OSP system, saving time spent on maintenance, and cutting costs and energy consumption. It is also proposed that the resulting Kubermesh nodes could also run all of the other elements of the warehouse, from display screens to pick stations independently (assuming that all of the corresponding applications can be run successfully within containers).

Additional details in the Kubermesh project can be found on the Ocado Technology blog "Creating a distributed data centre architecture using Kubernetes and containers", and the Apache 2.0 licenced prototype source code can be found in the Kubermesh GitHub repository.

This article was updated on 7th June 2017 to clarify the usage of CoreOS's matchbox during server provisioning, and also to indicate that installation of Kubermesh on OpenStack managed infrastructure is on the roadmap.

Rate this Article