InfoQ Homepage Presentations Linux Foundation's Project EVE: a Cloud-Native Edge Computing Platform

Linux Foundation's Project EVE: a Cloud-Native Edge Computing Platform

Bookmarks

View Presentation

Speed:

Download

51:34

Summary

Roman Shaposhnik covers design and implementation of a novel Edge Computing platform created at ZEDEDA Inc. and later used as a founding project for the Linux Foundation's LF Edge initiative. He focuses on this new, special purpose, open source operating environment that aims to run securely on billions of ARM and x86 devices. He covers the unique challenges that EVE has to tackle.

Bio

Roman Shaposhnik is an open source software expert, currently serving on the board of directors for both The Apache Software Foundation and LF Edge. He is a co-founder and the VP of product and strategy for ZEDEDA. Throughout his career, he has held technical leadership roles at several iconic companies, including Sun Microsystems, Yahoo!, Cloudera and Pivotal Software.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Shaposhnik: I think through the introduction, it's pretty clear who I am. If you're interested in talking to me about some of the other things that I do in the open source, feel free to do that. I happen to be very involved in Apache Software Foundation and Linux Foundation.

Edge Computing is 'Cloud-Native' IOT

Today, we will be talking about edge computing. Let's start by defining the term, what is edge computing? I think we started a long time ago with IoT, Internet of Things. Then Cisco introduced this term called fog computing, which was telco-ish, IoT view. I think edge computing to me is very simple. It is basically cloud native IoT. It is when the small devices, I call them computers outside of data centers, they start to be treated by developers in a very cloud native way. People say, "We've been doing it for years. What's different?" The difference is it's all of the APIs and all of the things that we take for granted in the cloud and even in the private data center today. That actually took time to develop. We didn't start with Kubernetes, and Docker, and orchestration tools, and mesh networks. We started with individual machines. We started with individual rackable servers. That's basically what IoT is still, individual machines. The whole hope is that we can make it much better and much more exciting by applying some of the cloud native paradigms like liquid software, pipeline delivery, CI/CD, DevOps, that type of thing, but with the software running outside of your data center.

Edge Isn't Your Gramp's Embedded and/or IoT

When I talk about edge, let's actually be very specific, because there are different types of edge. I will cover the edge I will not be talking about. Very specifically, let's talk about the edge that's very interesting to me, and I think it should be interesting to all of you. These are the type of devices that some people called deep edge, some people call enterprise edge. These are basically computers that are attached to some physical object. That physical object could be a moving vehicle. It could be a big turbine generating electricity. It could be a construction site. The point being is that something is happening in the real world and you either need to capture data about that something, or you need to drive the process of that something. Manufacturing is a really good example. You have your pipeline. You're manufacturing your product. You need to control that process. You have a computer that is typically called industrial PC attached to it. Same deal with a construction site, or even your local McDonald's. In McDonald's, you want to orchestrate the experience of your customers. You have a little computer that's attached to the cash register. You have a little computer that's attached to the display, and all of that needs to be orchestrated.

What I'm not talking about, I'm not actually talking about two things. I'm not talking about Raspberry Pis. There's definitely a lot of excitement about Raspberry Pis. It's interesting because if you think about the original motivation for the Raspberry Pi, it was to give underprivileged kids access to computing. It was basically to replace your personal laptop or desktop with essentially a very inexpensive device. The fact that Raspberry Pis now find their way into pretty much every single personal IoT project, there's almost a byproduct of how they designed the thing. I am yet to see Raspberry Pis being used for business, most of the time they just stop at the level of you personally doing something, or maybe you doing something with your friends, your hackerspace. Today, we'll not be talking about any of that. The reason we're not talking about that is because just like with container orchestration and Docker, you don't really need those tools unless you actually do some level of production. You don't really need those tools if you're just tinkering. You don't need Kubernetes to basically run your application if you're just writing an application for yourself. You only need Kubernetes if that is something that actually generates some business. We will not be talking about Raspberry Pis. We'll not be talking about telco edge, edge of the network, all of that.

Connected Device - Data at the Edge

Even this slice of the edge computing alone, given various estimations, represents a huge total addressable market. The biggest reason for that is the size of the data. These computers are connected to something that is in the real world. The data originates in the real world. The previous presentation today about self-driving vehicle from Uber is a perfect example of that. There's so much data that the vehicle is gathering, even if it was legally allowed, it is completely impossible to transfer all of that data to the big cloud in the sky for any processing. You have to orchestrate that behavior on the edge. As practitioners, we actually have to figure out how to do that. I was a little bit underwhelmed that Uber is focusing more on the machine learning. I understand why, but I'm an infrastructure guy. Today, I will be talking to you about infrastructure, how to make those types of applications easily deployable.

The good news is the total addressable market. The bad news is that it's a little bit of a situation like building the airplane while it's in flight. I think it would be fair to say that edge computing today is where cloud computing was in 2006. 2006, Amazon was starting to introduce EC2. Everybody was saying, it's crazy, it will never work. People at Netflix started doing microservices. Everybody says it's crazy, it will never work. The rest is history. Edge computing is a little bit of that. My goal today is to give you enough understanding of the space, to give you enough understanding of the challenges in this space but also the opportunities in this space. Also, explain maybe a little bit of the vocabulary of this space so you can orient yourself. I cannot give you the tools. I cannot really give you something that you will be immediately productive at your workplace, the same way that I can talk about Kubernetes, or Kafka, or any other tool that's fairly mature. Edge computing is just happening in front of our eyes. To me, that's what makes it exciting.

Are You Ready to Live On The Edge?

In a way, when I say cloud native, to me, edge computing represents basically one final cloud that we're building, because we've built a lot of the public clouds. There's Google. There is Microsoft. There is obviously Amazon. All of these are essentially in the business of getting all of the applications that don't have to have any physicality attached to them. What we're trying to do is we're trying to basically build a distributed cloud from the API perspective that will be executing on the equipment that doesn't belong to the same people who run public clouds. Edge computing is where ownership belongs to somebody else, not the infrastructure provider. From any other perspective, it's just the cloud. People always ask me, "If edge is just another cloud, can we actually reuse all of the software that we developed for the cloud and run it on these small computers"?

It used to be a challenge even to do that, because those computers used to be really small. The good news now is that the whole space of IoT bifurcated. The only constraint that you have from now on is power budget. It might still be the case that you have to count every single milliamp. If you're in that type of a business, you're doing essential Snowflake's and bespoke things all the time. There's really no commonality that I can give you because everything has to be so super tightly integrated, because you're really in a very constrained power budget. Everything else where power is not a problem, it used to be that silicon cost used to be a problem, but that's not the case anymore. Thanks to the economy of scale, you can basically get Raspberry Pi class devices for essentially a couple dozen bucks. It actually costs more to encase them in a way that would make them weatherproof than to actually produce the silicon.

The computers are actually pretty powerful. These are the type of computers we used to have in our data centers five years ago. Five years ago, public cloud existed. Five years ago, Kubernetes already existed. Docker definitely existed. The temptation is to take that software and run it at the edge. There have been numerous attempts to rub some Kubernetes on it because, obviously, that's what we do. We try to reuse as much as possible. Pretty much every attempt of reusing the implementation that I know of failed. I can talk in greater details of why that is. APIs are still very useful. If you're taking the implementation that Kubernetes gives you today, that will not work for two reasons. First of all, it will not work because of the network issues. All of those devices happen to be offline more than they are online. Kubernetes is not happy about that type of situation. Second of all, and this is where you need to start appreciating the differences of why edge is different, interestingly enough, in the data center, the game that Kubernetes and all of these orchestration technologies play is essentially a game of workload consolidation. You're trying to run as many containers on as few servers as possible. The scalability requirements that we're building the Kubernetes-like platforms with are essentially not as many servers and tons of containers and applications. On the edge, it's exactly the reverse. On the edge, you basically have maybe half a dozen applications on each box, because boxes are ok, but they're still 4, 8 gigs of memory. It's not like your rackable server, but you have a lot of them.

Here's one data point that was given to us by one of our biggest customers. There's an industrial company called Siemens. That industrial company is in the business of managing and supporting industrial PCs that are attached to all things. Today, they have a challenge of managing 10 million of those industrial PCs. By various estimations, total number of servers inside of all of the Amazon data centers is single digit millions. That gives you a feel for what scale we should actually be building this for.

Finally, the economics of the edge is not the same as with the data center. All of these challenges essentially, make you think, we can reuse some of the principles that made cloud so successful and so developer friendly nowadays. We actually have to come up with slightly different implementations. My thesis is that the edge computing will be this really interesting, weird mix of traditional data center requirements, and actually mobile requirements. Because edge computing is like the original edge computing is this. Actually, the original edge computing, I would argue, is Microsoft Xbox. With this we really got our first taste for what an edge computing-like platform could look like. All of the things that made it so, the platforms, Android or iOS, the mobile device management approaches, cloud, Google Play Store or Google services, all of that will actually find its way into the edge. We have to think about, how will it look like? We also need to think about traditional data center architectures, like operating systems, hypervisors, all of that. I will try to outline and map out how Linux Foundation is trying to approach this space.

Challenges at the Edge

Edge is actually pretty diverse, not just in terms of the ownership, but also in terms of the hardware and applications. Today, let's take industrial PCs. Pretty much all of them are running Windows. They're all x86 based hardware running Windows. When I say Windows, I actually mean Windows XP. Yes, it exists. A lot of SCADA applications are still based on Windows XP. If you show up as a developer and start razzle-dazzling these customers with your cloud native microservices-based architectures, the first question that they're going to ask you is, "It's all great. This is the new stuff. What about my old stuff? I want to keep running my old stuff. Can you give me a platform that would be able to support my old stuff, while I am slowly rebuilding it in this new next-generation architecture?" That becomes one of the fundamental requirements.

Scale, we already talked about the geographic aspect of it and deployments and the maintenance. The security is also interesting. Edge computing, unlike data center is much closer to this. Because edge computing is physical, which means you cannot really rely on physical security to protect it. It's not like there is a guy holding a machine gun in front of a data center, you cannot put that guy in front of every single edge computing device. You basically have to build your platform, very similarly to how iOS and Android are protecting all of your personal data. That's not something that data center people are even thinking about, because in a data center, you have your physical security and you have your network security. We are done with that. On a perimeter, you pay a lot of attention to it, but within the data center, not so much.

Also, interestingly enough, what I like about edge is that edge is probably the hardest one to really succumb to a vendor lock-in. Because the diversity is such that not a single vendor like a big cloud provider can actually handle it all. Edge is driven a lot by system integrator companies, SIs. SIs are typically pretty vertical. There may be an SI that is specializing in industrial, in retail, this and that. That diversity is actually good news for us as developers because we will not see the same concentration of power like we're seeing in the public cloud today, so I think it's good for us.

Microsoft Xbox

A lot of what I will be covering, in this talk, I wanted to pitch this other talk that just was made publicly available, taken out. This is the first time ever that Microsoft Xbox team talked about how they develop the platform for Xbox. That was done about a month ago, maybe two months ago, first time ever. A lot of the same principles apply, which makes me happy because we thought about them independently. The tricks that they played are really fascinating. The challenges they faced are very similar to the edge. If you want to hear from somebody who can claim that they successfully developed an edge platform, listen to those guys. I'm talking about the platform that's being developed. Mine can still fail, theirs is pretty successful.

From The People Who Brought You CNCF

Let's switch gears a little bit and talk about how Linux Foundation got involved in all of this. I shouldn't be the one to tell you that Cloud Native Compute Foundation has been super successful. In a way, I would say that Kubernetes was the first Google project that was successful precisely because of CNCF. I love Google, but they have a tendency of just throwing their open-source project over the wall and basically say, "If you like it, use it, if you don't, not our problem." Kubernetes was the first one where they actively tried to build a community. The fact that they went and donated it to Linux Foundation, and that was the anchor tenant for the Cloud Native Compute Foundation, I think made all the difference. Obviously, Linux Foundation itself was pretty happy about this outcome. They would like to do more of it.

The thought process went exactly like what I was talking about. When I say inside of data centers, I mean public cloud or your private data center. It doesn't matter. It's just a computer inside of a data center. For all of that, there's basically a forum of technologists that can decide, what is the common set of best practices that we all need to apply to the space to be more productive, more effective? That's CNCF, Cloud Native Compute Foundation. For all of the computers outside of data centers, it feels like we at least need to provide that type of forum even if we don't really have an anchor tenant like Kubernetes still. We need to give people a chance to talk among themselves, because otherwise there is really no way for them to synchronize on how the technology gets developed. That's LF EDGE.

LF EDGE

Linux Foundation Edge Initiative was announced, not that long ago, actually, this year. It was announced in January, February this year. My company, ZEDEDA, we ended up being one of the founding members. We donated our project. There are a lot of companies in the space that are now part of the LF EDGE, so if you're interested, you can go to this lfedge.org website. The membership is pretty vast at this point. These are the premium members. There are also tons of general members. A lot of the good discussions are already happening within LF EDGE.

To give you a complete picture, what does LF EDGE cover? LF EDGE basically covers all of the computers outside of data centers. It starts with what we consider to be partial edge. A partial edge would be a quasi data center. It's not quite a data center, but it looks almost like a data center if you squint. A good example of that would be a telco central office, a telco CO. It's not really built to the same specification that a telco data center or a hyperscale data center would be built for, but a lot of technologies still apply. That's definitely in scope for LF EDGE. Then we basically go to telco access points. These are already physical devices. We're talking base stations. We're talking 5G deployments. These are all of the things in the CD infrastructure, or any infrastructure that would have to run some compute on them. That's definitely in scope for LF EDGE. Both of these are pretty dominated by telcos today, for good reason, because they're probably the best example of that type of an edge computing.

Then there are two other examples of edge. One that I will spend a lot of time talking about, we call it, for now, enterprise edge. This is basically all of those industrial PCs, IoT gateways. An example of the enterprise edge would be also a self-driving vehicle. Uber or Tesla building it would be also an example. Finally, there's obviously consumer edge. This is all of your washers, and dryers, and your refrigerators, all of that is in scope for LF EDGE. Every single one of these areas basically has a project that was donated by one of the founding companies. HomeEdge is from Samsung, which is not surprising because they're making all of these devices that you buy. Enterprise edge is us, ZEDEDA, and a few big enterprise companies like Dell, those types of guys. There's project Akraino that's dominated by telcos.

Interestingly enough, I have a friend of mine from Dell, Jason Shepherd, who keeps joking that this edge thing, it's very similar to how this country was settled. Because it feels we're now running away from the big hyperscale cloud providers, just like in the good old days people were running away for big businesses on the East Coast. The only place for us to actually build this exciting technology now is on the edge because everything else is dominated, and you have to join Google or Facebook to have a play in there. Go West, young man, go Edge.

These are the projects. I will be specifically talking about one of them, Edge Virtualization Engine. Check out the rest on the Linux Foundation website. I think you will find it very useful. Edge Virtualization Engine is what was donated by my company, ZEDEDA. We're actually working very closely with Fledge. Fledge is a middleware that runs on top of the project EVE. EVE stands for Edge Virtualization Engine.

Edge Requirements

Specifically, what requirements does EVE try to address? We basically approach looking at these boxes essentially from the ground up. We feel that we have to take control pretty much from the BIOS level up. I will talk about why that is important, because a lot of the technology that you would find at the BIOS and board management level in the data center simply doesn't exist on the edge. For those of you who know BMCs and iLOs, those things are not present on the edge for obvious reasons, because the control plane is not really to be had on the edge. Who are you going to talk to even if you have a BMC? Which creates an interesting challenge for how you can cut down on BIOS, and things like that. We feel that we need to start supporting hardware from the ground up. The hardware at the same time has to be zero touch. The experience of actually deploying the edge computing device should be as much similar to you buying a mobile device as possible. You get a device with an Android pre-installed. You turn it on, and you can run any applications that are compatible with an Android platform, so zero touch deployment.

We also feel that we need to run legacy applications. The legacy applications would include Windows XP. For Windows XP, you actually have to make sure that the application can access a floppy drive. That's a requirement. You also need to run real-time operating systems for control processes. You need to basically do hard partitioning of the hardware to guarantee the real-time SLAs on these applications. You need to build it at IoT scale, but what it really means is it needs to be at the same scale that all of the services that support your mobile devices operate at. What it means is that when you talk about edge computing, just building a service, a control plane in a single data center is not good enough, because your customers will be all over the place, sometimes even in Antarctica, or in the middle of the ocean. That also happens. You have to figure that one out. The platform has to be built with zero trust, absolutely zero trust, because we all know the stories of hacks that happened at uranium enrichment plant at Iranian facilities. The attack vector was very simple. It was a physical attack vector. Those things will keep happening unless we secure the platforms, and make them trustworthy as much as possible.

Finally, and that's where all of you come in, those platforms have to be made cloud native, in a sense that what APIs we give to developers to actually provide applications on top of them. Because if you look at the state of the industry today, and I already scared you at least a little bit with my Windows XP story, but Windows XP is actually a good story. The rest of the industry is still stuck in the embedded mindset. It's not a good embedded mindset. It's not like using Yocto or something. It's using some god-awful, embedded operating system that the company purchased 12, 15, 20 years ago, where people cannot even use modern GCC to compile the binary. That's the development experience in the edge and IoT today. I think it is only if we allow the same developers who built the cloud to actually develop for these platforms, it's only then that edge computing will actually take off. Because we are artificially restricting the number of innovative people that can come to the platform by not allowing the same tools that allowed us to make cloud as successful as it is today.

App Deployment Is But the Tip of The Iceberg

I talked a lot about various things that we plan to tackle. As developers, when I talk about cloud native, people tend to really just focus and assume app deployments. They're like, "Give me app deployments, and I'm done." The trouble is, app deployments, the way we think about them in a data center is just the tip of the iceberg on the edge. My favorite example that I give to everyone is, even if you assume virtualization, on the edge you basically have to solve the following problem. Suppose you decided on Docker containers, and now there is one Docker container that needs to drive a certain process, and another Docker container that needs to get a certain set of data. The process and the data happened to be connected to the single GPIO. This is a single physical device that basically has a pin out. Now you're in business of making sure that one container gets these two pins, and the other container gets those two pins. It's not something that would even come up as a problem in a data center. Because in a data center, all of your IO is basically restricted to networking, maybe a little bit of GPU. That's about it. Edge, is all about IO. All of that data that we're trying to get access to and unlock, that is the data that we can only access through a reasonable IO.

A Complete Edge 'Cloudification' Proposal

There are a lot of interesting plumbing challenges that need to be solved first before we can even start deploying our Docker containers. Docker containers are great. I think the thesis that we have at LF EDGE, at least within the project EVE, is basically very similar to what you would see in a data center, but with a certain set of specific details attached to it. We feel that edge needs to be treated exactly like you treat your Kubernetes cluster edge. The physical nodes, like your pods will be out there. There will be a controller sitting typically in the cloud, or it can sit on-prem, either one. All of these devices will basically talk to the controller just like your pods talk to the Kubernetes controller. Then somebody deploying the applications would talk to the control through typically a Kubernetes-like API. It is very much guaranteed to be a Kubernetes-like API. I think the API itself is great. That's very familiar to all of you. The question is, how do we build the layer that actually makes it all possible? That's where the project EVE comes in.

Edge Infrastructure Challenges Solved with Edge Virtualization

If I were to go through EVE's architecture, high level view, very quickly. It all starts with the hardware. Actually, it starts with the physical devices that you attach to the hardware. Then there needs to be some operating system that would allow you to do all of the above. That operating system needs to be open source. It needs to be Android of the edge type of an offering. That operating system will talk to the control plane. The control plane will sit in the cloud. On top of that offering of an operating system, you would be running your applications just like you do today in a data center, so a very typical, very familiar architecture.

Typically, your applications will talk to the big clouds in the sky from time to time, because that's where the data ends up anyway. You need to help them do that. Because a lot of times, people will talk to me and say, "I'm deploying my edge application today using Docker." I'm like, "That's great." They're like, "Now we need to make sure that the traffic flows into this particular Amazon VPC. How can we do that?" It just so happens that now you have to read a lot of documentation, because there's strongSwan involved, there's IPsec. It's not really configured by default. It's like, how can we actually connect the big cloud in the sky with this last cloud that we're building called edge computing? That has to come out of the box. These are essentially the requirements. That's the high-level architecture. I will deep dive into one specific component, which is EVE today.

4 Pillars of Complete Edge 'Cloudification'

What we're trying to accomplish is, at the open-source layer, we need to standardize on two components. One is the runtime itself. The other one is the notion of an application. An application we're now trying to standardize we're calling that standard edge containers. The runtime is project EVE. At the top you basically have catalogs, and you have control planes. That's where companies can innovate and monetize. I would expect a lot of big cloud providers to basically join LF EDGE and essentially start building their controller offerings. Just like Amazon today gives you a lot of managed services, that will be one of the services that they would give you.

EVE's Architecture

Deep diving into project EVE. EVE today is based on the type-1 hypervisor, currently Xen. We actually just integrated patches for ACRN. ACRN is Intel's type-1 hypervisor. It's a pretty simple layered cake, very traditional virtualization architecture. I will explain why virtualization is involved. It's hardware, a hypervisor, then there's a bunch of microservices that are running on that hypervisor. Finally, you get to run your containers.

EVE Is Going To Be For the Edge What Android Is For Mobile

That is to say that we're building the very same architecture that Android had to build for the mobile. The biggest difference being that Android built it in 2003. They essentially answered the same questions that we're answering just in a different way, because those were different times. The hardware was different. The questions are still the same. The questions are, how can you do application and operating system sandboxing because you don't want your applications to affect the operating system and vice versa? How do you do application bundling? How do you do application deployment? What hardware do you support? We are answering it more closely to a traditional virtualization play. Android basically did it through the sandboxing on top of JVM, because it made sense at the time. At the end of the day, I think Android also had this idea in mind that mobile platforms will only be successful if we invite all of the developers to actually develop for them. At the time developing for mobile was painful. It was that type of an embedded development experience. It's god-awful compilers, tool chains from the '80s. One of the key pieces of innovation of Android was like, let's actually pick a language that everybody understands and can program in called Java. We're essentially doing the same, but we're saying, language nowadays doesn't matter because we have this technology called Docker container. Language can be anything. It's the same idea of opening it up to the biggest amount of people who can actually bring their workloads to the platform.

EVE: A Post-, Post-Modern OS

EVE happens to be a post-, post-modern operating system. When I say it like that, I've built a couple of operating systems. I used to work at Sun Microsystems for a long time. I've built a couple of those. I used to hack on plotnine. I spent a bit of time doing that. All throughout my career, an operating system wanted to be a point of aggregation for anything that you do, hence packaging, shared libraries. An operating system wanted to be that point, that skeleton on which you hang everything. What happened a few years ago with basically the help of virtualization and technologies like unikernels, and things like that, is that we no longer view an operating system as that central aggregation point. An operating system these days is basically just enough operating system to run my Docker engine. I don't actually update my operating system, hence CoreOS. I don't really care about my operating system that much. I care about it running a certain type of workload. That's about it. That's what I mean by post-, post-modern operating system. It is an operating system in support of a certain type of workload. In case of EVE, that workload happens to be edge container.

Inside of EVE, there is a lot of moving parts. I will be talking about a few of those today. If you're interested, we actually have a really good documentation, which I'm proud of, because most of the open source projects lack that aspect of it. Go to our GitHub if you want to read some of the other stuff, so it's LF EDGE EVE, and click on the docs folder. There's the whole design and implementation of EVE that would be available to you. Let's quickly cover a few interesting bits and pieces. Here, I'm doing this hopefully to explain to you that what we're building is legit, but also maybe generate some interest so you can help us build it. If anything like that sounds interesting to you just talk to me after the presentation, we can figure out what pull request and GitHub issues I can assign to you.

LF EDGE's EVE Deep Dive

EVE was inspired by a few operating systems that I had privilege to be associated with, one is Qubes OS. How many of you do know about Qubes OS? That's surprisingly few. You absolutely should check out Qubes OS. Qubes OS is the only operating system that Edward Snowden trusts. That's what he's running on his laptop, because that is the only one that he trusts. When he was escaping, his whole journey was Qubes OS that was running on his laptop. It's not perfect, but it's probably the best in terms of security thinking that I have seen in a long while.

Then there is Chrome OS. It's basically this idea that you can take an operating system and make it available on devices that you don't really manage. SmartOS was like Chrome OS or CoreOS, but derived from Solaris. EVE today is based on the type-1 hypervisor. People always ask me, why type-1? Why KVM is not allowed. The answer is simple. It's that requirement for the real-time workloads. Yes, patches for the real-time Linux kernel exist. They are really tricky. If you're talking about a pretty heterogeneous set of hardware, it's actually really tricky to maintain this single view of guaranteeing that your scheduler in Linux kernel would really be real-time. We use type-1 hypervisors and an ACRN, our choice today. We're running containers. We're running VMs. We're running unikernels. Basically, everything gets partitioned into its own domain by hypervisor but those domains can be super lightweight. With projects like Firecracker, that becomes faster and faster and pretty much indistinguishable from just starting a container.

DomU, basically where all of the microservices run, that is based on LinuxKit. LinuxKit is one of the most exciting projects in building specialized Linux-based distributions that I found in the last five years. It came out of Docker. It basically came out of Docker trying to build Docker Desktop. LinuxKit is how Docker manages that VM that happens to give you all of the Docker Desktop Services. It's also based on Alpine Linux. We get a lot of Alpine Linux dependencies.

We're driving towards unikernel architecture. Every single instance of a service will be running in its own domain. All of our stuff is implemented in Go. One of the really interesting projects that we're looking at is called AtmanOS, which basically allows you to do this, see that line, GOOS equals Xen, and you just do, go build. AtmanOS figured out that you can create very little infrastructure to allow binary run without an operating system, because it so happens that Go is actually pretty good about sandboxing you. Go needs a few services from an operating system, like memory management, scheduling, and that's about it. All of those services are provided directly by the hypervisor. You can actually do, go build, with GOOS Xen and have a binary that's a unikernel.

Edge Containers

Finally, we're actually trying to standardize edge containers, which is pretty exciting. We are trying to truly extend the OCI specification. There have been a few areas in the OCI that we're looking at. Image specification itself doesn't require much of a change. The biggest focus that we have is on registry support. We don't actually need runtime specification because OCI had this problem that they needed to integrate with other tools. Remember, classical operating system, when all the classical operating systems were black box execution engine. We don't need to integrate with anything but ourselves, hence runtime specification is not really needed. Good news is that there are actually a lot of the parallel efforts of extending the OCI into supporting different types of containers. Two that I would mention are Kata Containers, which are more traditional OCI, but also Singularity Containers, which came more from HPC and giving you access to hardware. Weaveworks is doing some of that same thing. Check them out. Obviously, Firecracker is pretty cool as a container execution environment that also gives you isolation of the hypervisor.

Top three goals that we have for edge containers are, basically allow you not only file system level composition, which is what a traditional container gives you. You can basically compose layers. We happen to be glorified tarball. You do everything at the level of the file system. You add this file, you remove that file. We're also allowing you block-level composition. You can basically compose block-level devices, which allows you then to manage disks, VMs, unikernels, this and that. We allow you hardware mapping. You can basically associate how the hardware maps to a given container, not at the runtime level, but at the container level itself.

We still feel that the registry is the best thing that ever happened to Docker. The fact that you can produce a container is not interesting enough. The fact that you can share that container with everybody else, that is interesting. We feel that the registry basically has to take onto an ownership of managing as many artifacts as possible, which seems to be the trajectory of OCI anyway. Things like Helm charts and all the other things that you need for orchestration, I would love for them to exist in the registry. Because that becomes my single choke point for any deployment that then happens throughout my enterprise.

EVE's Networking Is Intent Based

Eve's networking is intent based. You will find that very familiar to any type of networking architecture that exists in VMware or any virtualization product, with a couple of exceptions. One is cloud network, which is, literally, intent is, connect me to that cloud. I don't care how. I'm willing to give you my credentials but I need my traffic to flow into the Google, into Amazon, into Microsoft Cloud. Just make it happen. The way we do make it happen is each container or each VM, because everything is virtualized, basically gets a virtualized NIC, network interface card. What happens on the other side of that NIC? Think of it as basically one glorified sidecar, but instead of using the sidecar that has to communicate through the operating system. We communicate through the hypervisor. Basically, the VM is none the wiser of what happens to the traffic. All of that is configured by the system, which allows us really interesting tricks, like networking that Windows XP into the Amazon cloud. Otherwise, it would be impossible. You can install IPsec to Windows XP but it's super tricky. Windows XP that just communicates over the virtualized NIC and the traffic happens to flow through IPsec and to Amazon cloud, that Windows XP instance, is none the wiser.

Another cool thing that we do networking-wise is called mesh network. It is basically based on the standard called LISP, which has an RFC 6830. It allows you to have a flat IPv6 overlay namespace where anything can see anything else. That IPv6 is true overlay. It doesn't change if you move the device. What allows it to do is basically bypass all of the NetBoxes, and all of the things that may be in between this edge device and that edge device, so that they can directly communicate with each other. Think about it as one gigantic Skype or peer-to-peer system that allows everything to basically have a service mesh that is based on IPv6 instead of some interesting service discovery. That's networking.

EVE's Trust Model - Zero Trust

On trust, we're basically building everything through the root-of-trust that's rooted at the hardware element. On Intel most of the time it happens to be TPM. TPMs exist in pretty much every single system I've seen. Yet nobody but Microsoft seems to be using them. Why? Because the developer support still sucks. On Linux, it actually takes a lot of time to enable TPM and configure TPM. We're virtualizing the TPM. We use it internally, but then the applications, the edge containers get the virtualized view of the TPM. We also deal with a lot of the crap that exists today in the modern x86 based system. Because a lot of people don't realize it but there is a lot of processors and software that runs on your x86 system that you don't know about. Your operating system, even your hypervisor is not the only piece of software. We're trying to either disable it or make it manageable. Our management starts from the BIOS level up. Thanks to Qubes for pioneering this. Everything runs in its own domain. We're even disaggregating device drivers. If you have a device driver for Bluetooth and it gets compromised, since it's running in its own domain, that will not compromise the rest of the system. Stuff like that.

EVE's Software Update Model

EVE's software update model is super easy for applications. It's your traditional cloud native deployment. You push to the cloud and the application happens to run. If you don't like it, you push the next version. You can do canary deployments. You can do all of the stuff that you expect to see from Kubernetes. EVE itself needs to be updated. That's where ideas from Chrome OS and CoreOS kick in. It's pretty similar to what happens on your cell phone. It's dual partitioned with multiple levels of fallback, lots of burn-in testing that we do. We're trying to avoid the need for physical contact with edge nodes as much as possible, which means that a lot of things that would have you press a key would have to be simulated by us. That's a whole tricky area of how to do that. That's something that we also do in EVE. We are really big fans of the open-source BIOS reimplementation from coreboot, and especially u-root on top of coreboot. That allows us to basically have a complete open-source stack on everything from the BIOS level up.

Hardware-protected vTPM 2.0

The most interesting work that we're doing with TPM, and I have to plug it because I get excited about every single time, we're trying to basically do a hardware-protected vTPM, something that hasn't been done before, even in the data center. There's a group of us who is doing it, if you're interested you can contact any one of us. TrenchBoot is the name of the project. There's Dave Smith and LF EDGE in general.

Demo

Eve itself is actually super easy to develop. That's the demo that I wanted to give, because it's not a QCon without a demo. EVE is based on LinuxKit. There is a little makefile infrastructure that allows you to do all of the traditional operating system developer things. Basically, typing make run would allow you to manage the operating system, run the operating system. The only reason I'm mentioning this is because people get afraid a lot of times if I talk about operating system development and design, because there's a little bit of a stigma. It's like, "I need a real device. I need some J-Tech connector. I need a serial port to debug it." No, with EVE, you can actually debug it all in the comfort of your terminal window on macOS.

The entire build system is Docker based. Basically, all of the artifacts in EVE get packaged as Docker containers. It's actually super easy to develop within a single artifact. Because we're developing edge containers in parallel, we are planning to start using that for the unikernel development as well, which might, interestingly enough, bifurcate and be its own project. Because, I think when it comes to unikernels, developers still don't really have the tools. There's a few available like UniK and a few others. There's not, really, that same level of usefulness of the tools that Docker Desktop just gives me. We're looking into that as well.

Key Takeaways

Edge computing today is where public cloud was in '06. Sorry, I cannot give you ready-made tools, but I can invite you to actually build the tools with me and us at Linux Foundation. Edge computing is one final cloud that's left. I think it's the cloud that will never ever be taken away from us. By us, I mean people who actually run the actual physical hardware. Because you could tell, I'm an infrastructure guy. It sucked when people stopped buying servers and operating systems, and now everything just moved to the cloud. My refuge is edge. Edge computing is a huge total addressable market. As a founder of a startup company, I can assure you that there is tremendous amount of VC activity in the space. It's a good place to be if you're trying to build a company. Kubernetes as an implementation is dead, but long live Kubernetes as an API. That stays with us. Edge computing is a lot of fun. Just help us build either EVE, a super exciting project, or there are a few projects to pick in the LF EDGE in general.

Questions and Answers

Participant 1: We see the clouds, AWS, Azure, and all that. Is there L2 connectivity? Are you using, for example, the AWS Direct Connect APIs, and for Azure, ExpressRoute? That's what you're doing?

Shaposhnik: Yes, exactly.

Participant 1: I belong to Aconex and we are delving into a similar thing, we already allow people to connect to the cloud. We'll look deeper into this.

Shaposhnik: Absolutely. That's exactly right. That's why I'm saying it's a different approach to running an operating system because I see a lot of companies trying to still integrate with Linux, which is great. There is a lot of business in that. What we're saying is Linux itself doesn't matter anymore. It's the Docker container that matters. We're extending it into the edge container. Docker container is an edge container. It almost doesn't matter what an operating system is. We're replacing all layers of it with this very built for purpose engine. While it's still absolutely a valid approach to still say, "I need Yocto," or some traditional Linux distribution that integrates with that. I think my only call to action would be, let's build tools that would be applicable in both scenarios. That way we can help each other grow.

Participant 2: I know in your presentation you mentioned that edge is going to be more diverse. What's your opinion on cloud providers extending to the edge through projects like Azure Sphere and Azure IoT Edge?

Shaposhnik: They will be doing it, no question about it. I think they will come from the cloud side. Remember that long range of what's edge and what's not edge. They will basically start addressing the issues at the CO, the central office. They will start addressing the issues at the maybe Mac access points. I don't see them completely flipping and basically running on the deep edge. The reason for that is, business-wise, they're not set up to do that. The only company that I see that potentially can do that is Microsoft. Because if you want to run on the deep edge, you need to develop and foster your ecosystem, the same way that Microsoft developed and fostered the ecosystem that made every single PC run Windows. Amazon and ecosystem don't go together in the same sentence. Google is just confused. If anybody tackles it, that would be Microsoft, but they are distracted by so much of a low-hanging fruit in front of them just moving their traditional customers into the cloud, that I just don't see them as applying effort in that space. It may happen in five years, but for now, running this company, at least I don't see any of that happening.

Participant 3: What about drivers for sensors on these edge devices? It seems EVE abstracts the OS away from you, but in industrial, for instance, you need to detect things, so you need peripherals.

Shaposhnik: Correct. What about drivers? Because it's a hypervisor based architecture, we can just assign the hardware directly to you. If you want to have that Windows XP based VM drive your hardware, we can do that. That's not interesting, because we need software abstractions that will make it easier for developers to basically not think about it. That is the work that is a very nascent chunk of work. How do you provide software abstractions for a lot of things that we took for granted, like there's a file in /dev someplace, and I do something with it through Yocto. Now we're flipping it back and saying, "If I'm running a Docker container, what would be the most natural abstraction to a particular hardware resource?" A lot of times, surprisingly, to me, that abstraction happens to be a network socket. We can manage the driver on the other side of the hypervisor. Again, we will still run the driver in its own domain. To all of the containers that want to use it, we will basically present a nice software abstraction such as network socket.

See more presentations with transcripts

Recorded at:

May 27, 2020

Roman Shaposhnik

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?