InfoQ Homepage Presentations Real-World Examples of FaaS

Real-World Examples of FaaS

Bookmarks

View Presentation

Speed:

Download

37:56

Summary

John Graham-Cumming talks about Cloudflare Workers, a new architecture launched by Cloudflare. They launched it over a year ago, bringing the ability to run JavaScript and then any WASM-targeting language on their 165+ locations around the world. He looks at real-world examples, not proofs-of-concept or textbook ideas, of what's being built using this new architecture.

Bio

John Graham-Cumming is CTO of Cloudflare and is a computer programmer and author. His open source POPFile program won a Jolt Productivity Award in 2004. He is the author of a travel book for scientists published in 2009 called The Geek Atlas and has written articles for The Times, The Guardian, The Sunday Times, The San Francisco Chronicle, New Scientist and other publications.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Graham Cumming: Who knows of Cloudflare here? Who's used Cloudflare Workers? A small number of people. I'm going to do a little bit of an introduction to what Cloudflare Workers is, but not try and to be too sales pitchy because I'm technical, but I do need you to know what it is. Personally, I actually hate this term Functions as a Service because I think that when you see some of the examples of what's being built on our platform, they are much larger than functions. But that's what people seem to be calling it at the moment.

The Story So Far

So if you think about the story of, well, part of the story of computing, it's something like I buy hardware, and if you go back from other vendors, and I pay for machines and I have a lot of people managing it. At some point, you get infrastructure from people at Digital Ocean. I rent that kind of stuff. And then I can just run some code in a platform I'm familiar with. Particularly with Heroku, for example, you had Rails, was directly available. Then of course within Amazon there's Lambda and Lambda@Edge, which are, deploy some functions everywhere around the world where they have locations.

It’s been this gradual thing. The idea, I think, is that you're paying slightly differently. If you're in the beginning, you're paying by machine and you're managing all that stuff, and you have to have people. And ultimately, you're getting down to your dollars per request. So you're paying by the request, because everything has moved over to HTTP, HTTPS, and you're just paying for the unit of work you want. You don't want touch, you have to reserve everything. What's interesting about this to me is if you say, "Well, this is the story so far," I actually say this is back to the future. Because if you go back to 1936, Alonzo Church, this is the paper with the Lambda calculus in it, is really just talking about programs being functions and having a very robust mathematical foundation for it.

Then there's a real interesting paper by Wheeler in '52, which is the use of subroutines in programs. Now, if you've not read that, what's striking about it is there is, in a sense, the discovery of a subroutine. Previously it was like, we'll write everything, and it was like, "Hey, we could package things up." Actually in the paper, he talks about two types of subroutine, one that does a single thing, like print something out. In fact, I think he gives an example of calculate Sine of X. Then there's another subroutine, which is more complex which might rely on other subroutines, which in his case is integrating a function, because he's thinking very mathematically. But essentially, this idea that you could package stuff up into usable units.

And he says, "This could be super useful." He's like, "If we could have these subroutines, then it'll be extremely, extremely useful but where are we going to put them?" And so if you read this it seems he's super anachronistic. He's saying, "Well, maybe they should be stored on punch tape, or are they going to be in some auxiliary store on the machine? Well, there's a subroutine over here. I think I'll use that subroutine." And actually, if you go even further back than this, you go back to Babbage. Babbage has this notion of needing data which isn't present in his machine and having a bell that summons an assistant to go get a punch card, tells the assistant which punch card to go get, goes gets it and the assistant loads it. So there's always been this like where does this stuff run? Where does it exist? A perennial computing problem.

Interesting enough, he sort of hints at this future. He says, "Usually, it'll be found that is not possible to write the subroutine such that they may be put into arbitrary positions in the store." He's talking about machines here where you couldn't have your program organized how you wanted it. Although in certain machines, this is now possible. So he's saying in certain machines, it was possible to have arbitrary location subroutines and have these things available. The whole paper talks about how you might call subroutines and how you might save state and stuff that we find completely obvious.

But I think what's interesting about this paper is it's wrestling with fundamentally, it’s like, we would like to have compute functionality available to us to do functions anywhere, wherever it is, and as fast as possible. The reason he wants it in store is he wants it to be quick. He doesn't want to have to have load punch tape, or he doesn't want Babbage's Bell summoning somebody.

In a way, nothing's changed since '52. It's just that we've got all this compute power. What this paper does not talk about is going out and buying machines, renting hardware, and also all sorts of stuff. Partly because it was hard to do that, because the machines were huge, and you couldn't just go rent them, but also partly because what they thinking about here is just the fundamentals of computing. And I think what's interesting about Functions as a Service, if we want to talk about that, is trying to get back to this sense of, I can just write a program, it'll just run anywhere. I don't have to think about the underlying, "Well, how many machines have I got? Have I allocated enough virtual machines? What availability zone am I in? Why am I even thinking about this? I'm just a programmer, I just want to write code."

Also I just loved this. He says, "This will be really cool if we have subroutines. What should we be thinking about if we're going to start writing subroutines? We should think about simplicity of use, correctness of code and accuracy of description." By description here he means the documentation or by saying, "This is what this thing does. All complexity should, if possible, be buried out of sight." I mean, this is 1952. I think this should be a slogan for anybody writing code right now. Which is like please don't tell me about the complexity of this. When you come to think about the computing platforms, where you have, I have to manage hardware or I have to manage virtual machines or I have to do all this kind of stuff.

Although, I think, as engineers, we love to get into all the detail of how it all works, fundamentally, why? I'd just like to write some code and run it somewhere. So, we're developers. We would like to just write code. Just, I don't want to have to do anything.

The Next Revolution Will Be Network-Based Serverless

We think there's a revolution thing going on, which we're going to refer to as network-based serverless or network-based Functions as a Service, but the idea is you start to forget about where the code runs. You have it all over the world near where end users are, and it just magically gets distributed and executed and scaled without you thinking about it. So Cloudflare is, if you've not heard of Cloudfare, we started out as a very large caching/CDN and security service. We've built out a very large infrastructure around the world, and we've built a platform for running functions written initially in JavaScript, and then any arbitrary language that supports web assembly, so in particular Rust, because that's got the best web assembly support at the moment, and it runs across our entire network globally.

What we have seen is, so when we put this into production, we said, "Well, the reason we're going to do this is that if you give programmers the ability to write code, they will write something interesting." We had no idea what the interesting thing is they'll write. We had some ideas what might get written, but write arbitrary code on our platform and off you go. So we have now, it's been well over a year, a very large number. In fact, it's quite surprising, but almost a double digit percentage of requests that go through Cloudflare actually execute code written by customers. So it's suddenly blown up like a crazy amount. There are some nice things on there.

So just to think about this, one of the ways I've been thinking about this is, I explain to people, this is a new place to put part of your application. If you think about an application you're building today, you have a server somewhere. It could be you decided to buy servers, it could be you went to Amazon, Digital Ocean or somebody like that. But fundamentally, there's some service somewhere. And then there's the application itself. It might be running in a browser, it might be an app on a phone, but there's these two extremes, and the internet is in the middle. Your application might be in the internet because it might be in three Amazon availability zones, but fundamentally there's some server somewhere and there's an application.

The server is great because it's secure, so you can keep secrets on there, you can keep customer data, API keys, all sorts of stuff. It's extremely fast to update. You can have it in your CI pipeline and you can just update that code and you can change the backend. But it tends to be long latency to the user. So anything interactive thing is quite slow. So you have that sort of characteristic.

Then on the browser side, you have the opposite. It's extremely low latency to the user, maybe microsecond, very super interactive features. It's relatively slow to update. So if it's an app, you're waiting for an app update cycle. If it's in a web browser, you may be waiting, trying to force the user to refresh or coming up with some scheme to get the latest code in there. So it's relatively slow to update, and also, you tend to get different versions running out and so people don't update. It's fundamentally pretty damn insecure, especially if it's in a browser where somebody can run an arbitrary plugin which decides, "I don't want to run that bit of JavaScript a customer has, or I'm going to inspect this." So it's a very different world.

The middle bit, which is putting code into something like this network-based thing, is a mixture of the two. It has the security/ You can put in there an API key secret because this is code that's running in our service. It's not running on the end user's device. It's very fast to update. If you make a change in our code, and if I get a chance or have time I'll do a live coding, but when you hit save that code, it's roughly 15 seconds to be globally deployed. Australia is quite far away, it tends to be the long pole, but roughly 15 seconds. I think our P99 is 15 seconds. I think the P95 is 10 seconds. So the idea is you can be altering that thing. But it's low latency to the user. We've built out this very large network. So you're relatively close. You're not as close as the device, but you can do a lot of interactive stuff pretty close to the end user.

A couple of things that are interesting. Why do we have to build out such a large network? Well, this is UK actually, I think, but internet connection speeds, bandwidth have gone up tremendously over a long period. This is 2011 to 2017, continues to go up. We're going to get LTE and 5G and all sorts of amazing things. So bandwidth goes up, so we can move a lot of content around. But we have not made the same progress with the speed of light over the same period. I stopped updating this graph in 2016 because I held out, never heard the speed of light will change over time.

The speed of light, a lot of application performance is actually around optimizing for speed of light problems. You try to move things close to end users, you try not to transfer so much stuff, you try to not have 70 API calls back and forth. This is going to be a problem for dog-walking drones. I keep showing this slide to people and I hope, well, somebody actually came up to me and said, "That's a really good idea. I should do that." And I was like, "No, you shouldn't. No, you shouldn't." But the idea is this, you can imagine a world in which you're too busy to walk your dog so you have a drone that walks your dog, and the drone is going outside and that's great. The drone could be using local information to figure out where its route, GPS, and stuff like that. But of course, you're going to meet other dogs with drones. So there's going to be some coordination. A lot of that can be done locally.

But what if there's that one dog that your dog hates and they always argue? Then you need some information about, "Oh, that dog is here and this dog is here," and you're going to start coordinating that information. The interesting thing is, where do you do that? Do you do it on a server, which is on the other side of the world? So, you're in New Zealand walking your dog, and the application has been built by all the cool kids in Reston, Virginia, and unfortunately that ends up with a dogfight because the latency is so long back and forth. "Hey, is that a bad dog? Yes, that's a bad dog." But it's too late, they've been bitten.

So you want to move some of the application locally. Fundamentally, you're making a decision about where is my application run? Hopefully, you move it to Australia or hopefully you move it to Wellington and it's even closer. If you're using one of the large scale providers, you're really stuck with this "What zone am I in? Where am I deploying my application in the configuration phase?" You really don't want to care about that. You really want to care about how close you are to the end user.

This is Cloudflare's network or some variant of it, lots, lots of locations. The dark area is people. That's where the people live in the world. This isn't some nuclear war, this is actually 10 milliseconds from every Cloudflare location. We're right now building out in West Africa because there's a lot of work to do in West Africa, there are a lot of people there doing things. There's more in Southeast Asia and South America. But the idea is, by the end of this year, that everybody is within about 10 milliseconds of where the code is running. So we're pretty close, but there are some areas to work on, 166 locations. This graph is actually slightly out of date. It just gives you an idea; the fundamental idea of the architecture is 165 locations around those 66, whatever the hell it is today. Then if you look at the last cloud providers, they have a small number of zones around the world. So you're sort of playing games about latency.

Just to give you an example. In our office in San Francisco, we have this wall of lava lamps, and there's a camera looking at the lava lamps, and that's feeding a random number generator. We use that for end of number seeds because that's moving randomly. That's available as an API internally and externally. Actually, if you want some lava random numbers, you can have them. The Mega Millions had absolutely huge prize not very long ago, a couple months ago. So we made this site, Lava Millions, where you could get lava lamp-generated Mega Millions numbers. It makes sense to people who play the lottery. That's cool and everything.

But what's interesting is this website is entirely a piece of JavaScript running on our edge. There is no back end this for this. It calls an API, it makes an API call asynchronously, gets some random data, turns into numbers and serves the page. I think it took about 30 minutes to make the website deploy globally, and it's running in all of those locations around the world. That's a silly example but I can show you the code if you want to see that later.

Inject Security Headers on the Fly

What are people actually doing with this? So we deployed this. I think we announced it September, a year ago, and people have started building things on it. So security headers.io. A very nice site which will add all of the good security headers to your website. You have the website go through Cloudflare and as the requests come through, let's make sure it's got all of the new security features in there, because not everybody can actually modify their backend so easily. This is a service, you would say, “Do this,” and it'll get done.

So what does that look like? I'll give these slides away if you want to read the code, but this is the actual code to give you an idea. This is obviously JavaScript. This basically, on the right, is the function which runs honor requests. A request comes in for that website and there's an event listener to find the top left, adds this headers thing, and basically goes through and it says the first line is fetch. So it literally goes and gets the web page from the origin or from cache, wherever it needs to get it from, and then it can modify the response. So it's actually going to go through, pull out the headers, add a list of security headers. It'll add content security policy, XSS protection, strict transport security, all those things into it, remove certain things. In this case they've chosen to change the server header, and remove a bunch of - they don't want X powered by or whatever appearing. That all goes through, then right at the end, return the response body to the person who requested it. So that's something you can do in GitHub, deploy it through an action in your CI platform, or just write it in RIDE and push it out. And that will just execute on every request coming through us in all those locations around the world. So it's very simple to do that. All those builds.

Identifying pwned Passwords

Troy Hunt has this lovely porn passwords site where you can test in real time if someone has a bad password. So if someone is using a password that's been leaked and is probably not a good one. The API is interesting because what it does is, you hash the password, you pass part of the hash to him, you get back a list of things that have that part of the hash in it, then you check yourself and say, "Is that particular hash in this?" You don't actually reveal what the actual password was the person typed in to him. That's provided there's an API, which is running with us. So this is an actual code that basically is looking at a post request, so somebody actually posting in a password. It then does the appropriate hash, submits it. If you look at the await, there's an await crypto subtle, so SHA-256 there and then it's actually going to make an API call.

This is the thing we're seeing lots of people do, which is in line, in the middle of a request, make an API call to something else. Now, it could be synchronously, so in this case it could be, I'll go out and I'll call this and I'll wait for the result. But it could also be asynchronously. It could be logging. So it'd be like I’ll return their response to user and I'll do some logging while I'm still executing. So what this does is it literally calls his API, gets back the list of hashes that match that checks to see if this particular hash appears in there, and it just adds a header in this case, what was the password, the was hacked, which will be sent then back to the real origin server and the real origin server can then make a decision. "Should I warn the user in some way? Should I send them an email, warn them?" So, oh, Salesforce. Who doesn't love Salesforce? Here we go. Me. So, this is real code of people are using protective websites. Let's see.

Another thing people are doing- and this is very common- is taking authorization or authentication pushing it away from their server and running it inside the edge of the network. So validating JWTs, extremely [inaudible 00:17:52] interesting example that does involve drones funnily enough using that standard. So JWT is there. It's been turned into some Base 64 kind of thing. Then you can just validate it on the edge in the code. And we have an example code of how to do this. So what it's allowing people to do is not have any of the authorization authentication logic on their server. The server comes very simple. Once the request comes from Cloudflare, they already know it's authenticated and authorized correctly.

Online Maps

A couple of other examples. There's a thing called Factorio. Has anybody played Factorio? It's like factory making game where you make these incredible factories, and there are these incredibly detailed maps. We have arbitrary levels of zoom where you can zoom into these things, and they're utterly, utterly huge. There's a guy who runs this thing called Factorio Maps where they have these beautiful images. He set it up and people loved it. And then Amazon said, "We love it too. Here is the bill for all of the traffic you're moving around." And he said, "I have to do something about this."

And so what he did was he moved- obviously, he started to use Cloudflare, and he's got these different levels of what they call leaflets. So the little bits of these tiles of this thing as he's zooming in. So he's going through the different layers. And what he did was he decided to do a couple things. One is, he would use Google and Backblaze, and then he would play them off against each other. What he actually does here is he says, "Well, this image could be in Google, it could be in Backblaze. I'll try and get both of them. Whichever is fastest, I'll serve to the end user." So he uses promise.race, and he literally races on our side the two backends, and gets the fastest and then he can return that to the end user. And it actually turns out Backblaze is extremely cheap and has a deal with Cloudflare, where egress doesn't cost anything so his actual thing could run.

But this is kind of interesting. If you want to coordinate multiple backend services and actually, say, maybe which one comes back fastest or maybe something is optional, you don't want to actually include, that can be done. He also sends logs to StatHat. So actually once he's returned images to the end user, he then makes a call and he pushes out to StatHat to say, "Well, this particular thing happened."

We have a few people who are doing this kind of stuff. They hide something in their database or something like, "Shh. This is a secret," and then they look for it leaking. They insert JavaScript code onto our edge, which is literally looking at the response body coming back, so we fetch the response body from them. They then just look for that secret, look for a credit card number, look for something like that, and then they can warn about it. And this particular customer which I've sanitized, it's not actually, "Shh. This is a secret," something more complicated than that. What they do is they call PagerDuty inside the worker. So if data starts to come out on a request, they actually make a call out to PagerDuty and say, "We're leaking data. This is the detail of the request." All that is happening asynchronously inside. The end user has been served actually, in this case, a 403. So they actually say, "Oh, somebody has leaked. I'll give the end user a 403. I won't give them what they actually got, and I will call PagerDuty." But from a performance perspective, the 403 was served immediately and the PagerDuty happens afterwards, but all within the worker.

Geo-Targeting

Geo-targeting is super common. People want to contact from particular regions by language, block content in certain regions or certain countries depending on what they're doing. That's extremely common, load balance and custom load balancing. Obviously, we provide load balancing but people have their own rules they want to implement, write it in JavaScript. One particular interesting one I liked is this one. What it does is we add a header saying what country the request came from because we look it up for you and tell you as part of our service. So what this one does is it gets the country, and then it tries to see if there's a localized version of the page that you're looking for. So it actually modifies the URL and says, "The thing you want /U.S. or /UK," so whatever your format would be for localized pages. It then requests that from your server. And if you get a 404, it says, "It hasn't been localized into Spanish, let's say. We'll serve the English language page." And if it has, then it will serve the country specific page and then put it in cash. So it becomes cash. This way they don't have to update. "Oh, we've now done simplified Chinese. We have to update." No, it's fixed for them automatically. So, again, these tend to be very small bits of code that people are writing.

There's an older technology called EdgeSide Includes from Akamai, which allows you to do this kind of thing. So you have a web page passing through you, and you have this ESI code include another thing. What it does is actually on the CDN side, it would go through past the page that it's about to serve, find that, make the sub-request and insert, literally build the page dynamically by inserting it on the CDN side. People want to move away from the style but they don't want to be forced to actually rewrite everything. So basically, we can just take that and do exactly the same thing on our side. So we said, well, if this is returning text HTML, then we will stream back the body from us looking for the ESI, include this thing, so this is just streaming the body.

Finder, in this case, we're just using our reg apps to pull out their thing, make a sub-request because now in the edge, we can make that a sub-request. What's interesting here is that because we're extremely well connected to the internet way of the world, these sub-requests happen exceedingly quickly. If they’re on Cloudflare, then they'll happen in the same machine, and actually, very often in the same process space. Use incredibly fast response or anything that's shared. So this does that and then it does the insertion and makes the web page.

Sub-request is just like this, pulls a sub-request, gets it. In this case you wait for the response because you need the text. All fairly, I think what's interesting about these examples is they're all JavaScript, you can imagine writing and they're not like rocket science. "Oh, yes, I can do that, then I can put the scale,"and suddenly you've got a site.

DroneDeploy

DroneDeploy is a real company that deploys drones. This is a field of solar panels at a power generation station. And they're looking at the temperature of the panels and the surrounding land to look for faulty panels. If they're faulty, they're not turning light into electricity. They tend to get hotter and then you can actually spot them. So they operate drones in the field for industrial purposes and for their web application that this thing is talking to, they want to authenticate users and they're doing it verified AWT. This is their actual code. They're actually going to Google Storage for the images and the content and they're just doing it on the edge, because then it's simple for them to add a layer of authentication there. They're adding on top of something that doesn't necessarily have authentication. So it could can be on top of an s3 bucket and say, "Well, we'll add our own authentication on top of it."

Discord

The AWT stuff is not really rocket science. I want to talk just a little bit about Discord. So Discord, obviously is extremely popular with gamers and many other people. They are a customer of ours, and we found out from a Hacker News comment, [inaudible 00:2526] worked it, and they said, "Yes, we've actually moved our entire website inside of Cloudflare Worker.” So they took what was their market. If you go to discordapp.com, there's no origin server. It's all running in our machines around the world entirely from the edge. And then they've started to move bits of logic from their servers and from their application into the edge. I'm told there's more and more of this stuff. So locale detection, their pre-rendering things, all of their feature flags are actually in us. So if you're a developer that you can set something, you can get the latest version of Discord.

They are a good example of a customer I can talk about who's done the initial Geo-targeting authentication, little bits of code, and is now saying, “Actually I could move more of my app here for performance reasons closer to the end user and I can update it quickly and it's secure."

Generate PDF

This is kind of a fun one. Somebody internally did this but for a good reason, which is generating a PDF entirely in the edge. This is a silly PDF generator, which it generates a PDF containing the URI. So if you add whatever you want after pdf.obez.uk, you're going to get a PDF that contains our logo and the URI you typed in to the demonstration of it. But this is the entire code to generate this PDF. So it requires a jsPDF, which is a JavaScript PDF generator, and then it just makes a PDF. It includes an image, set some texts, and then outputs, it just returns the response. Here's a PDF.

Now, this sort of stuff is getting more and more common as well. I think I have some examples. So what are some real things other people are doing? One customer is watermarking PDFs in our Worker. So they are a customer that sells PDFs, and if you're authenticated and you buy a PDF, they actually modify the PDF as it passes through with your watermark. This was sold to John Graham Cumming. This is his customer number, etc. So she modified it so, they don't have to do on their side. Sign in binary is super common. This binary was signed for this particular person. So add Metadata, sign it on the edge, that's it. A/B testing, decision making on the edge, is huge. Anyone who's used A/B testing knows that there's this awful performance problem, which is you go to webpage and there's a big pause while the A/B testing thinks, "Oh, should I do this or should I do that?"

We're seeing a lot of people have moved out into the edge. So they know the information from the visitor. They'll make a decision about what to serve for them. If they're using React, some of them are actually doing the pre-render on the actual edge and then the whole page is just, boom. They're getting a huge performance increase. We've seen a lot of people move away from using Varnish and VCL because they've reached the limitations for language and they wanted to use something more flexible. So they're saying, "Well, there's some limitations of what I can do. JavaScript clearly allows me to do whatever like."

Things to Think about

I think there are a few things to think about, and we don't know the answer necessarily, to these. We're seeing people like Discord move things into the middle, into the network. I don't know what's going to end up there. What are the right things to put in this place, rather than on the server? Some of the people internally would say, "Everything moves. Don't pay Amazon any money. Give it all to Cloudflare." I understand their vision, but at some point, what do you move? Now some of it maybe just your data locality. How much data do you actually need to do the process you want to do? Maybe you don't want to have that spread around the world. Maybe it's impossible to do.

But I think there's an interesting question to ask about these architecture tiers. And then what languages are going to dominate in this world? Obviously, we provide JavaScript, but we provide WebAssembly as well. In particular, Rust seems to be really pushing WebAssembly support. So we have a large number of people internally doing Rust work. And I think we're going to see Rust really emerge as a language for these serverless applications, more than some of the others. But clearly anything that's WebAssembly-targeted, we can support, something we can run.

I want to point you to two presentations that are going on. So Kenton Varda, who is the architect of this product, is talking today at 2:55 about how we do the sandboxing, how this actually works from a technical perspective around the world. And then Ashley Williams is talking about WebAssembly and the implications for the web, and that's tomorrow. She also works for us on these kind of things.

That's where I am. I'm going to tempt fate by trying to live code something. If you remember the Lava Millions website, this is the Lava Millions website. That ran out of a datacenter near us. So when I hit that it actually made it back end API call to our lava lamp random number generator, got some random data, came up with some random Mega Millions numbers. If you write these down, these are really good random numbers. So please feel free to take them. If I flip over to the Workers interface, this is our little interface ID for writing Workers code. This is this is obviously a website on the on the right here, and then this is the actual code that generates this. So I think you'll have a little bit difficulty seeing, but this is actually running on our server. In this case, I'm connected in London. So I'm in London.

You can write code directly, you can actually type it in and deploy it to the edge. At the same time, you can use GitHub actions or you can you can use your CI platform whatever you want to do through our API to write code. But I noticed just before I came in here that when I wrote this, you're not going to able to see this, but it says, "Here are your random Mega Millions numbers picked at random." It says random twice in a sentence, and I'm sure whoever taught me to write would say that was a bad idea. If I find the code here, here are your randoms. This actually generates the code. So I just delete that and I'll just hit Deploy. Yes, deploy it. Let's go here. And it's gone. So you see now it says, "Here are your Mega" That was deployed through 166 datacenters worldwide when I hit that Deploy button. The code is reloaded globally. So that's giving you a sense; you can do that arbitrarily for the code you want to run. And I got away with live coding. I'm going to stop right there.

Questions & Answers

Participant 1: Fascinating talk. So, considering low latency is important for Cloudflare, do I understand that you do not scale to zero, or how do you deal with cold startup?

Graham Cumming: You definitely need to go to Kenton's talk about this. But if you dial back in Cloudflare's history, Cloudfare's started completely as a self-service Company, and one of fundamental pieces of infrastructure is an internal key value store, which is globally distributed, called Quicksilver. So what I just did there was I pushed code and it pushed globally like that. Then on the machines, it was picked up by our service infrastructure as, "This is a new version of this thing happening." So there would have been a very small delay, milliseconds of cold start delay there, because the first time that code was loaded and then jetted and executed in memory.

So we don't have anything like the code start latency of say a Lambda or Lambda@Edge, because of the way it's architected is using V8. And Kenton can give you a really detailed look at that. But there's a great blog post by Zach internally, which looks at the cold start latency time, and it's few milliseconds to get that stuff reloaded. So in general, we don't have people keeping things hot, because it just doesn't make any sense.

Participant 2: Thank you for a great talk. One question regarding security. Now, all the Cloud Workers need to read text and clear text- did I get that correctly? The HTTPS encryption is decoded and then the Cloudflare service gets the contents in plain text and then encoded again to the backend. So how do you mitigate the risk that some data leakage is happening?

Graham Cunning: So the question is, if Cloudflare is terminating HTTPS and then redoing HTTPS, how do we make sure that our service is secure basically? Whether there are workers in there or not, that's fundamentally our business, because we do WAF and DDoS mitigation, all that kind of stuff. A lot of effort has gone into how do we secure the private keys which are on those machines, and actually, basically, they're kept encrypted at rest and then they moved to the machines still encrypted, and they can only be decrypted through these embedded keys in the software and there's a secondary key. There's a whole infrastructure of how do we do that to keep the keys secure within the machines.

Because, fundamentally our whole business rest on- there are 14 million websites users, most of them having HTTPS certificates from us. A core technology is how do you do that? How do you secure it? And then there's the physical security, software stack security. How do you silently sign all the software going up and down the stack? If you want to give me a card or something afterwards, I'll point you to a couple of blog posts by Nick Sullivan, who runs our crypto research group, about how that infrastructure is secured, because it is fundamentally a very important part of what we do.

Participant 3: Have you noticed any measurable improvements in performance by using Rust on your platform?

Graham Cumming: Compared to JavaScript? Honestly, no. Honestly, I mean, there was a paper the other day about WebAssembly performance versus native code, and there was something about the hand-waving numbers are about 80%, because it depends where the applications is and all kind of stuff. The WebAssembly stuff is extremely well optimized, and we're completely optimistic that WebAssembly just becomes the default platforms for this stuff. So, yes, there's a little bit of cost, and it depends what you're doing. I'm not saying you should use this for large scale machine learning training of models. That's crazy, right? Use something else for this. But, for inline in a request processing, serving applications from the edge, it's very high performance. It's really good. There's no causing us trouble.

Participant 4: What are the constraints that you have on your execution environment, like number of concurrent requests like time?

Graham Cumming: Depends on how much you pay us. So we do constrain memory and CPU time, not wall clock time. So, I'd have to look it up. But I believe by default, showing one of our free accounts, then there is 50 milliseconds of CPU time included. Then for the larger customers obviously, we will relax that constraint. We don't constrain wall clock time. In particular if your piece of code goes off and makes an API call to somebody else, we stop counting then, while the IO is happening. You're not involved. That means that the actual runtime can be much longer. It's really just how much time you're running doing the processing on the CPU, which for some applications is interesting. So for like the PDF generation, that one's fairly quick. Signing a large binary, we have to look at it. But I can tell you that the number of customers for whom we've had to relax that constraint, is less than 10. Most of these stuff can run within tens of milliseconds.

See more presentations with transcripts

Recorded at:

Aug 18, 2019

John Graham-Cumming

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?