How JD.com Moved to Kubernetes from OpenStack
JD.com, one of China’s largest e-commerce companies, recently shared their experience in adopting Kubernetes to evolve to an application container based infrastructure from an OpenStack-managed IaaS one. The move, that also includes an in-house component for networking, improved resource utilization by 30%.
JD.com’s deployment infrastructure went through two phases before they adopted application containers - physical machines (2004 - 2014) and operating system (OS) containers (2014 - 2016). The first phase had bare-metal machines managed by hand. The problems faced with this setup involved long go-live times (around a week from allocation to the application coming online), lack of isolation, resource underutilization and inflexible scheduling. Failure of machines led to app migrations which took hours. There was no autoscaling. The engineering team built in-house tools for common tasks like log collection, auto deployment, compilation and packaging and resource monitoring.
The second phase of JD.com’s infrastructure saw container adoption. OS containers were used, which means that their existing applications and deployment architecture were forklifted onto containers. Containers were just smaller, faster versions of their previous physical machines and it was not a full-fledged adoption of container philosophy.
However, the advantages in terms of time and resources in the second phase were numerous due to container adoption. OpenStack was used as the orchestration layer, with the nova Docker driver for container management. The team adopted Docker as the container platform and added some new features to it. All applications moved to containers, which reduced computing resource requests to a few minutes from a week. The average deployment density of applications and physical machine utilization increased three times. The team also built unified APIs to perform deployments. This was internally called JDOS (JD Datacenter Operating System) 1.0.
The OpenStack based setup had between 4000 to 10000 compute nodes in a single cluster. The JD.com team were running close to 150,000 containers by November 2016. This helped them to handle high traffic during two online promotions of the site, including one on 11 Nov 2016, which saw around 30 million fulfilled orders.
The move to containers paved the way for the engineering team to change their deployment architecture to use the container as the unit of deployment after the second phase. The team called this JDOS 2.0. The focus of this approach was not just infra management, but container management with application awareness. The design had two abstractions - System and Application. A "System" consists of several "Applications", with each application having several pods which provide the same service. A System corresponds to a Kubernetes namespace.
The components comprising the deployment pipeline and as well as the DevOps tools were containerized and deployed on the Kubernetes managed platform. These included Gitlab, Jenkins, Logstash, Harbor, Elasticsearch and Prometheus. A deployment process consists of source code and a Dockerfile being pushed to the repository and a Jenkins build. Jenkins was configured in master-slave mode, with a slave node for building and packaging the application and a similar node for building the container image. Harbor, an open source Docker registry, was used to store the created images.
Image courtesy : http://blog.kubernetes.io/2017/02/inside-jd-com-shift-to-kubernetes-from-openstack.html
For better integration between Kubelets and OpenStack Neutron, JD.com developed their own solution called Cane, based on the Container Networking Interface standard. Cane takes care of notifying the Neutron Load-balancing-as-a-service (LBaaS) when a Kubernetes load balancer is created, deleted or modified. A component called Hades inside Cane provides an internal DNS resolution service for the Pods.