InfoQ Homepage Presentations Building Modern Transportation System with KubeEdge: How We Made It

Building Modern Transportation System with KubeEdge: How We Made It

Bookmarks

View Presentation

Speed:

Download

37:56

Summary

Kevin Wang and Huan Wei discuss the benefits and challenges of adopting cloud-native technologies, cloud collaborative architecture with KubeEdge inside, and real-world use cases.

Bio

Kevin Wang is Lead of Cloud Native Open Source Team @Huawei. Huan Wei is Chief Architect at HarmonyCloud.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Wang: We are going to share about how we built the modern transportation system with KubeEdge inside. I'm Kevin Wang, currently leading the cloud native open source, so the co-founder of KubeEdge project, and also a CNCF ambassador.

Wei: This is Wei Huan from HarmonyCloud. I'm the Chief Architect. I'm also leading edge computing.

Modern Transportation System

Wang: With the development of edge computing, so actually the transportation or also upgrading the whole system in order to build the modern transportation system, actually, we think that a lot of edge computing, as well as the intelligent personalities is definitely needed. Also, especially edge, regarding the relationship with the cloud, we think that it's an extension of the cloud. We can benefit from the mature and the rich promises of your system in the cloud today, and easily start transitioning and moving some of the new lightweight applications down to the edge. You can throw the added efficiency and the network latency piece. With that in mind, we think that there would be a lot of heterogeneous nodes need to be managed in a simple way, with the network connection between cloud and the edge, loosely coupled, especially some of the edge when located in a private network or behind a firewall. Definitely, we need bidirectional communication mechanism between cloud and edge.

Regarding the overall architecture, typically there are two types of the edge computing in the transportation system. First one is the fixed location. For example, we want to provide some of the services outside the road or on the road for the vehicles. Another model is that we regard the vehicles as edge, though the location of the edge, and the edge node may keep changing. It also has a lot of challenges about the network connection, and also the cloud edge communication in that case.

The Fixed Location: Highway Electronic System

First of all, let's go through the fixed location. KubeEdge is already adopted in a lot of end user environments. I'm going to share about how we thought of the whole system in the highway electronically. In China, actually, the legacy tolling system relies a lot on the human. It's an intensive job. Monolithic systems are isolated between each other. For example, awaiting the cars, identify the car plates, and also collecting the tolling information. In that case, vehicles may easily get stuck at the toll gate, especially for the trucks they need more time to slow down and dispatch, and always a lot of traffic jam is caused in that situation. One special thing in China is that around the boundary of the provinces, there are typically two toll stations between each other because the mechanism is that tolling information belongs to one province, always go into the corresponding province. That means that we need to end the tolling problem in the open road, when a vehicle is passing through from one province to another. Also, actually, the overall system, the deployment, the setup is highly distributed. There are around 10,000 toll stations in China, out of 34 provinces, cities, and autonomous regions.

Overall Architecture

Looking into a high level logical architecture, the overall center is the Highway Monitoring and Response Center, they are responsible for managing the whole system, including, for example, the tolling rules, and also monitoring the overall traffic statistics, and also tracking the car identifications, car plates, for example, and most of the licensed drivers' information. Make sure people are doing two downloads. Actually, at each toll station, there are around 10 booths in average. It means from end to end, there will be 100,000 toll booths to be managed in the whole system, in large scale. Applications, we want to automate them, make them able to communicate with each other in a microservices way. With that, a lot of human resources can be improved and the whole system efficiency can be improved. However, the underlying network, it's very complicated. Actually, the tolling information, the statistics information need to go into the response center, kind of dedicated network. However, the application deployment, upgrading, and the application lifecycle management, they need to be managed from the public cloud. From the edge side, actually, the edge part, for example the toll station or the gantries, they are located in a private network, they don't have public network access.

Environment at Edge

Also, looking at the edge in detail, picking one of the toll station, for example, actually, the hardware consists of different types, including x86 industrial computer, and Arm64 and 32 edge servers. The leaf devices, they also have multiple types, including the cameras, the main barrier gates, and a lot of sensors, including smoke sensor, water sensor. For the existing toll stations, toll upgrading is quite straightforward. We just add some controllers inside the toll booths to automate them. For some of the new toll stations actually, we don't need the toll booths anymore, we can directly set up the gantries, and throw in the cameras and sensors on top of them, and with edge servers outside the gantry. With that the vehicles including the cars and trucks, they don't need to slow down when they are passing through. They can just go straight forward. It's quite an efficient model for the traffic. Again, the challenge from a network perspective actually the connection between the toll station and between the edge and the cloud is unguaranteed. The packet loss rate is very high. Also, the bandwidth is very limited. It came down to 3 megabytes per second, so a lot of heavy data exchange may cause a network impact.

Why Kubernetes

How to manage such a system with a massive infrastructure, and also the highly distributed massive applications. In the cloud we can definitely take Kubernetes into a [inaudible 00:08:52]. It's already the de facto standard of managing containerized applications. The core concept including containers, deployment, pod, Replica, the DaemonSet, services, labels, and nodes, simplified the whole system designing, and a lot of well-known setup of the application. For example, the HA deployment, active standby, for example, can be easily mapped into the Kubernetes core concept model. Also, the scheduling, it's quite easy to be configured through the Kubernetes APIs, to pin some of the applications onto some of the nodes. However, we know that Kubernetes is designed for a data center, the connection between components are assumed very stable, and the bandwidth is very rich. If you are looking into the edge, including scenario, the edge node may have very limited resources, and also the edge autonomy, especially, will frequently get offline. We don't want the applications getting impacted. We want them able to recover from the local information when they are connected from cloud, and made some disaster.

KubeEdge Architecture

That's why KubeEdge provides unique value with this background. Taking the overall architecture, for example, users in the cloud, they can definitely benefit from the full powered vanilla Kubernetes software. With KubeEdge plugins installed, it's easy to upload applications from the cloud to the edge with the same experience today, people are using Kubernetes. Also, KubeEdge automatically build the connections between cloud and edge. Especially, it simplifies the setup of going through the private network, or passing through the firewall. It doesn't matter where your edge is located, what kind of network environment you would have. It's quite straightforward with KubeEdge. In EdgeCore, it's an all in one process to run the container lifecycle management as well as the service management on the OCI container engine runtimes are recorded. Also, the EdgeMesh is simplifying the service communication between pods. KubeEdge provides the mapper framework to simplify, and it can be coupled in relation with the devices. From the application developer perspective, you don't need to worry about any details of the devices in a certain environment. They just need to talk with standard topics defined by KubeEdge. The device protocol plugins implemented as wrappers will automatically convert from the standard message to the real message that the device understands.

KubeEdge in the Tolling System

Mapping KubeEdge into the tolling system, we have found that actually in the cloud, there's Kubernetes with KubeEdge in the cloud part. On the edge actually, each edge server is regarded as one of the edge node, and you can put in the cloud, and on the edge, you can run containerized applications as part. Also, KubeEdge simplifies the cloud and the edge communication. Actually, you can manage the applications on the edge just like you're managing the Kubernetes cluster in the cloud. Also, edge autonomy and the low resource consumption, and also the simplified device communication are also introduced with KubeEdge. With this, the applications talking to the devices, is quite simple and standard in this tolling system.

Large Scale Edge Node Management

Also, looking into the overall deployment, we know that actually the whole system, they have 100,000 edge nodes, and to simplify the overall management and to decouple the upgrading of the control plane. Actually, there are multiple Kubernetes clusters deployed, so kind of dividing the underlying edge nodes into different groups, and also to simplify and to avoid the network traffic impact, the node registration, node status, and pod status upgrading algorithms are optimized. Also, another very intelligent thing is the container image delivery. We know that bandwidth is very limited. At each province level, 10 data centers, it's actually edge 10 data centers, we have the container registry mirror, save the bandwidth between the province center to the central cloud.

The Movable: Vehicle Cloud Collaboration Platform

Wei: Then I'll introduce the movable edge. Currently, we are building a very large scale of our vehicle cloud collaboration platform for our clients, which is the largest vehicle manufacturer in China. In our scenario, every vehicle we'll register to a specific Kubernetes cluster when it's on board, and the vehicle will become an edge node, which can be managed and orchestrated by the private Kubernetes cluster.

Background, Ideas, and Motivation (From K8s Engineer's Perspective)

First, from Kubernetes engineer's perspective, we considered if we can realize the software-defined vehicles if the vehicle has enough compute resource. Then, if we can install container runtime like Docker on the vehicle, then that means all vehicles may be managed by a Kubernetes cluster. This is the background and the motivation.

Definition: Nodes and Applications in Vehicle

Here, in our scenario, we have some definition about nodes and apps. Here the nodes means the vehicles for commercial use only. The nodes are actually the computing main board, which have enough CPU and memory, and it is based on Linux operating system. Then we have already containerized a lot of apps. These types of apps includes like autopilot, smart-cabin, machine learning, and entertainments. We are also supporting more types in the future.

Challenges of Building "Internet of Vehicles"

There are lots of challenges when we are building such Internet of Vehicles. The Kubernetes design is based on some mechanisms like a wrist watch and status sync. That means it will open to some status sync like the node status sync and pod status sync. In the Internet of Vehicles scenarios, most of the time, the vehicles are always moving, that means if a car comes into a long tunnel, then it will be a very weak network connection environment. In this scenario, that means the vehicle will often be offline and online always, and Kubernetes will force the vehicle to do something like a node eviction and pod eviction. We will need features like edge autonomy, and also we need to handle the difficulties like the weak network connection between the cloud and edge.

Building Vehicle-Cloud Collaboration Platform

Finally, we have built a vehicle-cloud collaboration platform. In this picture, we can see this is three cars, physical car. We will install KubeEdge cloud coverage, called CloudCore with the Kubernetes cluster. At the edge cloud, we will install KubeEdge in each vehicle and also with the container runtime like Docker. Also, we will develop the containerized application, it is called mapper. The mapper will connect with all these devices in the vehicle, and it can also capture the device status and the data to the cloud. Also, the mapper will receive some control command from cloud. This is very typical data in the cloud edge collaboration.

Benefits from KubeEdge

By using KubeEdge, we have many benefits. First, because KubeEdge can be seamlessly integrated with Kubernetes, so we have our most old data visuals from Kubernetes, because it's very large. The communication mechanism between the cloud and edge is not a wrist watch, it's like a web [inaudible 00:19:41]. In this way it can support more nodes than the native Kubernetes. KubeEdge is very flexible. It can support something like customized endpoint rules, and customized channels. That means we can design and define our own communication rules between the cloud and edge. Also, from my point of view, KubeEdge is a really mature project. We have already used KubeEdge to help our clients with lots of use cases. KubeEdge is a very active community. We can often discuss in the Kubernetes community. We can often get a very quick response from the community.

Vehicle-Cloud Collaboration: Legacy Proposal vs. New Proposal

In this way, our vehicle-cloud collaboration proposal is a little different with the legacy proposal. As far as I can see, currently the most legacy vehicle-cloud collaboration proposal can only support functionalities like data sync, which means the data sync from the cloud to the vehicle, or from the vehicle to the cloud. It cannot support features like status sync. Especially, they don't support features like orchestration or schedule. In our proposal, we use Kubernetes and KubeEdge, so we can have all the features from Kubernetes, like the schedule and orchestration. That means we can send commands from cloud, or from our apps to a pack of vehicles, to a lot of [inaudible 00:21:44].

Vehicle Scale and Clusters Scale (Already Built and Estimated)

Finally, I show some statistics. Currently, we are planning to build at least 5 clusters with Kubernetes and KubeEdge. We use pods totaling more than 500,000 vehicles on board by end of year, 2025. Because, currently, our clients will onboard more than 200,000 new vehicles each year. It is vehicles with Linux operating system, each of our cluster is Kubernetes and KubeEdge, will support at least 100,000 vehicles. Currently, we have already realized this function. I think KubeEdge takes a very important role in our proposal.

Takeaways

Wang: Just to recap, Kubernetes provides the de facto standard application definition. It's definitely very helpful to unify the application management between cloud and edge. With KubeEdge actually, we have the seamless cloud and edge collaboration, with edge autonomy helps a lot to simplify the cloud and edge architecture in the modern transportation system. Also, on the edge they need to save a lot of the resource consumption and simplify the device communication. One thing I would like to highlight is that both in the fixed and the movable location model, actually the KubeEdge node are selected as the edge. It's easier to get deeper collaboration with the part in the cloud. It takes less resource on the edge. It's easier to get, overall, a highly distributed architecture.

In the highway electronic tolling system, I want to highlight that the application deployment period and the time consumption was optimized from monthly to daily. The overall scalability is currently managing 100,000 edge nodes with 500,000 pods in the production environment. The most challenging network is through the high package loss rate, with very limited bandwidth. In the vehicle-cloud collaboration platform, scalability challenge is even more challenging. There are 100,000 vehicles in each cluster. Actually, the network model is again very different, comparing to the data sent in vehicles, get offline quite often, and has the longer time period staying offline. Actually, the most exciting thing is that with this design, we are actually making the Internet of Vehicles, cloud native.

Questions and Answers

Fedorov: I noted in the presentation, you've mentioned that you're going to increase the size of the KubeEdge cluster, with the size of your deployment by about 100,000 nodes per cluster. What's driving that limit? What's the reason? Is it possible to run more than 100,000 edge nodes on KubeEdge, or there are technical limitations? If yes, what about those are?

Wei: I think it's the three of Kubernetes: scalability, extensibility, and compressibility. I think there are three key points. The first point is the edge supports more than 100,000 edge nodes in one Kubernetes cluster, which would enhance the scheduling logic from Kubernetes. Each deployment by the vehicle, and the user does not pass to the KubeEdge scheduler logic, but it is directly assigned to a specific vehicle. The second point is that because we leveraged the edge autonomy feature of KubeEdge that means all node status and pod status change before reporting. At the meantime the Kubernetes node lifecycle controller needs to be modified to reduce the lifecycle control of the edge nodes. The third point, I think, is that we keep alive edge nodes and the cloud, it's depending on the bottleneck of the cloud component called CloudCore. That means how many edge nodes can be connected with each CloudCore. Currently, we just use four CloudCore. Each CloudCore supports about 30,000 edge nodes. In the near future, I see the Kubernetes community will support the [inaudible 00:28:43] CloudCore. That means we should support more vehicles in each cluster in the future.

Fedorov: At this scale, when you have that many edge nodes that you're managing remotely, I'm curious what workflows or visualizations that you use just to keep track of that? I can't imagine operating with a system like that.

Wei: In our system we just use the UI, the channel in languages like Node.js, Java, and Golang. We use the UI just like we used as a cloud PaaS. We also use some automation workflow tools, just like Kubernetes and Jenkins, master-slave nodes to help the CI/CD workflow. We use EFK to store logs, and we use Prometheus to store and visualize metric data. Yes, they are the common tools we use when we build Kubernetes cluster. Because Kubernetes can be seamlessly integrated with [inaudible 00:30:09].

Fedorov: I think first, it's probably a question to Kevin, because you mentioned that you need to deal with bad connectivity to the tolling stations. Maybe you can share a little bit more about the specifics of the problem. On one side, if it's a fixed location, there is an ability to provide decent connectivity, or is that not necessarily the case?

Wang: Actually, in that system, the toll stations are actually around the boundary of the provinces, so that means actually these locations are quite far away from the cities. Actually, the existing network around the toll stations are just very limited. It's quite expensive to set up the optical fiber network. With that in the background, and with test data, we found that actually network connectivity is very limited.

Fedorov: What's your current security model? How do you support if you do support the trust none approach with various devices, and the communications between the KubeEdge and the backend? What's your general protocol security assessment for building the deployments like that?

Wang: Currently, the edge, actually, we are only limiting the applications to be able to talk to each other inside one edge site. It basically means inside of one toll station, because in that scope, they have the more clear trust scope, and actually the connection between the edge to the cloud, we are limiting the access to each node level. Basically, the nodes, they have different access, different privilege from each other. Even any node is under, for example, physical attack, it will only affect the one node under attack itself. It will not affect the others.

Fedorov: The functionality to manage the security access policies, is it built into the KubeEdge or is that something that you configure and manage separately outside of the KubeEdge, like a system?

Wang: Basically, the current security issue rely on actually the token and the certificates used between the cloud and edge. When setting up, we need to use different certificate, different key files for different edge node to make sure they have different access. In the longer term, we are working to automate that process so everyone can easily enforce that security.

Fedorov: What applications that you typically run, and let's expand to both static use case and the car use case, can you give specific types of applications and types of logic that they perform?

Wei: I think all the apps which needs quick development iteration, can be used in the system. Just like some apps like the autopilot, it always needs to be trained in the edge cloud and then deployed to the vehicle. Also, some apps like the machine learning, and the [inaudible 00:35:08] for the study and for the entertainment, all these kinds of apps can be deployed.

Fedorov: Kevin, in the static use case, how do you break down specific functionality of collecting the tolls? Is it a bunch of micro apps with each one doing its own part of logic or is it more a monolithic one application?

Wang: Actually, we are adopting the microservice model. Basically, we are dividing the monolith into different subsystems like the car plate identification, and the toll information collection. Each application, they will have several components, for example, the main process, as well as the health check and the logging and monitoring plugins.

Fedorov: Are you planning any integrations with WeChat, or any other third parties as part of the KubeEdge?

Wei: Yes, I think so. In the country, our clients is syncing lots of apps like this, because we need to build a very large scale of the community, and this function will be open to all these third party developers.

See more presentations with transcripts

Recorded at:

Sep 22, 2022

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?