InfoQ Homepage Presentations Building Resilient Serverless Systems

Building Resilient Serverless Systems

View Presentation

Speed:

Download

44:39

Summary

John Chapin explains how to use serverless technologies and an infrastructure-as-code approach to architect, build, and operate large-scale systems that are resilient to vendor failures, even while taking advantage of fully managed vendor services and platforms. He leads an end-to-end demo of the resilience of a well-architected serverless system in the face of massive simulated failure.

Bio

John Chapin is a co-founder of Symphonia, an expert Serverless and Cloud Technology Consultancy based in NYC. He has over 15 years of experience as a technical executive and senior engineer. He was previously VP Engineering, Core Services & Data Science at Intent Media, where he helped teams transform how they delivered business value through serverless technology and Agile practices.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Chapin: I do basically cloud consulting now, I have a background in data engineering and data science. I came to Serverless through solving some really interesting large-scale data-engineering problems. That's a little bit of what brought me to Serverless.

What we're going to cover today is, we're going to have a little server list refresher and I'm going to call out some attributes of Serverless, some of the benefits, some of the drawbacks that are going to resonate as we get further along in the talk. We'll talk a little bit about resiliency, and this is a theme you'll be hearing again from all the other speakers in Nicki's [Watt] track today. I have a demo, I've put together a new demo for this talk. We'll give that a shot and hope for that demo Gods bless us. This is the biggest room I think I've given a demo in, so I'm looking forward to that. Then, maybe we'll have a little bit of time at the end for discussion and questions, but if we should not have time, I will stick around up here as long as they'll let me to answer questions, or otherwise, I'll be out in the hallway.

What Is Serverless?

Let's just start off, this is my view of what is Serverless. When people ask me this question, I'm happy to be able to point them at a free report that we wrote for "O'Reilly." You can go download this, there's a link down here, and it should be linked if you want to grab the PDF slides later on. We call Serverless as two sides of a coin, functions as a service which is, things like AWS Lambda, things like Auth0 Webtasks, Azure Functions, Google Cloud Functions. Services that run your business logic, you give them a little bit of code and they take care of running it for you. The other side of that coin are back-end as a service. These are things like databases or messaging systems that we use from our applications, maybe we use them for Lambdas but they're largely managed and run by a cloud vendor.

Serverless functions as a service plus backends as a service. Attributes of these things, and we call out a bunch more attributes in the report but here's the ones I wanted to mention today. If something is Serverless, that means you're not managing hosts or processes explicitly. It's generally self auto-scaling and provisioning. What that means is, when the load changes, the service itself or things under the covers that are managed by the vendor are handling, adding more resources and directing load to those new resources to handle whatever your increase in scale might be.

The costs are based on precise usage. It's actually something that I'm really happy that changed recently with Dynamo, for DynamoDB, you used to have to provision in advance or do some coarsely-grained scaling. Now, we just get to pay for DynamoDB by the request which makes it much like Lambda in that, again, we're paying in this very fine increment based on the precise usage of that service.

I'll go into a little more detail about how this works on AWS, but we get implicit high availability. What I mean by that is these services that we're using, they're surviving the loss of a hardware component or the failure of a software component on the back-end. They're built to survive these failures, we don't need to worry about, "Ok, is that server up? Ok, if that server's up, then I can run Lambda. If not, I have to back off," that's all taken care of for us.

What are the benefits we get from this? I was tempted to try my hand at mapping but there's already been a lot of worldly maps at this conference, so I'm not going to step into that as well. These are the benefits of the cloud amplified a little bit more. Reduced total costs of ownership, we're spending less time mucking about in the operational side and maintaining a bunch of things that really aren't relevant to our applications or our users. AWS call this, "Undifferentiated heavy lifting." We get a ton of flexibility in scaling, and with that, we get a ton of flexibility in cost as well. If somebody's not using our service, we're generally not paying for it, we're paying very little for it.

We also get shorter lead time, so we can move faster, experiments are cheap to perform. We can spin up an experiment that'll handle a billion requests a day, if we don't like the results of that experiment, we can tear the whole thing down the next day, if we want to, we didn't have a bunch of lead time ordering hardware or getting things provisioned or whatnot.

With that, every time we take the step down the cloud path, we give up a little bit of control. When we owned the hardware, when we were installing the operating system and when we were installing the RPM packages and putting our application on top of that and configuring everything, we had full control, but again, we were spending a lot of time doing those things. The further we move down towards these vendor-managed services, we give up that control. We have limited configuration options, we have fewer opportunities for optimization, and we're hands-off with regard to a lot of the issue resolution. If we hand off all of our operations to the Lambda Service team, if Lambda has an outage, there's really nothing we can do in the moment to fix that, we have to rely on them to address that.

Feel free to go download that report, we go into a ton more detail on all of those things and it's a free download. With that in mind, I'm going to move on and talk a little bit about resiliency.

Resiliency

Werner Vogels, who's the CTO of Amazon, has his great post on his blog, "10 Lessons from 10 Years of AWS." One of the points he makes there is that failures are a given and everything will eventually fail over time. We're not in this world where we can confidently say, "Oh, we're going to get to zero failures, we're going to build systems that don't fail at all," what he's saying is that the systems we're building are so big and so complex that failure is just a statistical inevitability. Some part of the system will be in a failure mode almost all the time and we need to handle that.

Like he said, systems will fail, at scale, systems will fail a lot. We need to embrace failure as a natural occurrence. He's talking about this, I think, more from the perspective of a vendor running these services, but we're going to take this to heart as Serverless application developers as well. To the extent possible, we need to limit the blast radius of failures. We'll see some of the isolation mechanisms that AWS uses to limit that blast radius when things do go wrong because things will go wrong. We want to keep operating and we want to recover quickly. When we talk about resiliency, I think the dictionary definition of resiliency is talking about bouncing back from failure, recovering from failure quickly.

I had this slide in here, I used to make a joke about this being a webcam view of us-east-1, but what I actually wanted to point out is that, statistically, the whole house isn't on fire. To turn it around, this dog is saying, "This is fine. This is actually fine. This is the world we live in." Some part of it's always going to be on fire, some part of it's always going to be failing. We're going to trust that our vendors are making some good decisions about how they're isolating the various services that we're relying on, and we're going to architect our systems to the extent possible to deal with failure scenario. This is the state of the world that we're ok with it.

Failures in Serverless Land

Let's move further in, let's talk about failures in Serverless land. I think I've beaten this point to death, that Serverless is all about using these vendor-managed services. We essentially have two classes of failure, there's the application failure, so we did something wrong in our logic, we screwed up and our application is failing for some reason. That's our problem but also that's our resolution. We get to own the resolution for that class of failure. For pretty much everything else, it's still our problem, we may still be down, from our customers’ perspective, but the resolution's out of our hands.

What happens when those vendor-managed services fail or when these services used by those services fail? Because what we're actually seeing now - I'm talking about all of this from the perspective of AWS, that's where most of my work is, I think it's extensible to the other vendors as well - what we're seeing now is actually, for example, the Serverless Application Repository was built using all of the other AWS Serverless components. If one of those upstream components fails, you have the slew of other failures related to it. For example, when S3 goes down, you start to see a lot of other things, be affected as well because those other services are relying on that upstream service like S3.

Mitigation through Architecture

We're going to mitigate these failures through architecture, because that's where we have a little bit of control, that's where we can actually do something. What we're going to do, we're going to plan for failure, just like Werner said, we're going to architect and build our applications to be resilient. We're going to take advantage of the vendor-designed isolation mechanisms. The next slide, I'll go over what that looks like on AWS. We're also going to take advantage of the vendor-services that are designed to work across regions or globally to be on top of those isolation mechanisms, to do coordination and things like that.

What I’m setting up here today, isn't new, it’s not unique guidance. AWS has actually been pushing this for years, they tell you how to do this. I was joking with Nikki earlier that I could just boil this presentation down to me just pointing at this PDF and saying, "Go read that and do that thing."

There's a lot of information there, on the Serverless side, we're going to delve into the server side. This has a lot more information on how to do this thing with some of the other services, things like EC2 and whatnot. Let's talk about AWS isolation mechanisms because that theme's going to crop up once we get to the demo.

I've just taken a slice of AWS availability zones and regions. At the end of this slide, there's a link to a talk by James Hamilton who's a VP at AWS, responsible for a lot of the data-center design and global networking and whatnot. He gave a talk in, I think, it was 2016. That's fantastic, he goes into a ton of detail on exactly how these things are networked together and how faults are isolated. I totally encourage you to check that out, that talk blew my mind, it was so good.

AWS Isolation Mechanisms

This is a slice of AWS availability zones and regions, what we see here is we have these three groupings. These three groupings are regions and I think these A-Z counts are actually accurate. EU West, that's the London region, within that London region, there are three availability zones. What an availability zone is, in AWS speak, is an isolated data center, so a separate building that has all the servers and whatnot in it. It has separate power, it has separate network connectivity, if one of those goes away, if that building is taken offline, for whatever reason, none of the other availability zones are affected. Within this region, each availability zone is networked to all of the other availability zones so there's lots of backup network links.

Something that James Hamilton goes into in his talk is he's also talking about transport sites as well. Those are the links that AWS has, both to its global network and to all the interconnect stuff that's going on. Within a region, the reason these things are grouped into regions is they want to have close proximity of these data centers to each other for handling failovers and whatnot. Within a region, you're also guaranteed to be in one country, essentially in one jurisdiction, so that's important for a lot of us building those systems. At the time of James's talk, 2 years ago, they had 14 regions going to 18, and now, I think they're up to 21 and there's several more coming out.

That's essentially how they isolate workloads, this is physical isolation too, that's the other thing I wanted to point out here. If I have something running in eu-west-2a, that goes away, but I also had something in eu-west-2b. The thing at eu-west-2b is still running and I can use some of these services to shift load from one availability zone to the other. What we're going to talk about, in the next couple slides, is how we can use some of 80 versus Global Services to actually shift load and shift traffic between entire regions.

Serverless Resiliency on AWS

I talked earlier about how we get implicit high availability, so services like Lambda are running across multiple availability zones in a region. The way you tell that you have this implicit high availability with an AWS service is, if you get to choose an availability zone, then you don't have this regional high availability. If you have to choose an availability zone, that means you're having to choose a specific data center. With EC2 and things like that, we choose availability zones, therefore we are responsible for configuring and setting up that high availability within a region.

With Serverless, that's handled for us, we're just addressing Lambda or we're just addressing DynamoDB, or S3, or SNS, or all these other things at a regional level. Whatever's going on beyond that is AWS's problem. When we're talking then about global high availability, what we're talking about there is services running across, or applications running across multiple regions. That means I have an application and I've got it running both in eu-west-2, here in London, and I've also got it running in us-west-2 out in Portland, Oregon, in the U.S. That decision though, that mechanism for doing that is through architecture. We don't get that for free, we need to design that, we need to design our applications to take advantage of that.

Something I want to point out there is that Serverless cost model is a huge advantage. We talked about an attribute of Serverless is that you pay for very granular costs, you only pay if you're actually using the thing. If I have an entire infrastructure set up in one region that I'm using, ok, of course I'm paying for that. If I have that infrastructure duplicated in another region, what we're going to see in the demo architecture, other than maybe the cost of transferring data so it's available in that second region, I'm probably not paying anything for it. That's my entire resiliency, my backup architecture, but the other node in my global system is essentially free until I need to use it, until I need to take advantage of it. I think that's a huge advantage, we don't have to run the entire system several times just to get this global high availability.

Other attributes of Serverless or emerging qualities in Serverless systems that are helpful, they typically are event-driven. What that means is there's often very little data in-flight when you have to do these transitions. I talk about a transition, we'll see that we're going to shift traffic basically from one region to another. That data is often persisted to reliable stores, we're not holding on to it unlike a local disk or something like that. It's going into a data store that has its own high guarantees about resiliency.

Something we do in all of our workshops and tutorials and whatnot is we use a continuous deployment methodology. The first thing we do in a workshop is we have everybody set up a continuous deployment pipeline on AWS. This came in super handy because we actually gave a workshop and, during that workshop, S3 and us-east-1 went down. I think for a lot of workshops, it’s "Ok, well, we just need to wait for this thing to come back up." What we were able to do instead was just tell everybody, "Take your CloudFormation template that was your deployment pipeline, point that at us-west-2, deploy it,” and 5 minutes later, we're all back up and running while S3 is going through its whatever is happening in us-east-1. A hugely powerful approach. That makes it very easy to stand up, back up infrastructure all over the place or deploy to new regions if you need new capabilities.

One more thing I wanted to mention before the demo that didn't make it into this slide. This approach that we're going to be using to get this global high availability, what we're also going to get from that is much better performance, from the perspective of our users, because we're actually going to take advantage of something called DynamoDB global tables, which are essentially replicated DynamoDB tables but you can write to any member of that global table. I can write to the table in eu-west-2, here in London, I could write to the table in Portland, Oregon, and that data is replicated bidirectionally. What that means is we can set up our routing such that users in London hit the London area, get super-fast response time from London. Users in the U.S. hit the U.S. region and get super-fast response time from the U.S. We'll see how this works. I'm going to go through and overview the demo, and then, will actually give it a shot.

Demo

What we're basically building is a global highly available API and I took this a few steps further, this used to just be a REST API demo, which was great for a small room and I could just do it using cURL, but I wanted to put together something a little more interesting for you all to take home with you today. This is a little bit of REST API, it's got some WebSockets, so I wanted to use the brand new support for WebSockets, that's an API Gateway. I also wrote a little front-end in Elm so you can see how bad my front-end UX skills are. What this essentially is, is actually a chat application, a globally-distributed highly-scalable chat application. If I'm here in London, I could hit the back-end services here in London, write my messages to the database. Those messages are then replicated to any other regions that this application's running in.

This will help explain a little bit better. What we're doing, we're starting from the left-hand side, we're using a service called Route 53. This is Amazon's globally-available DNS service. What we can do there is we can tell Route 53, "Depending on where a user is located, direct them to one of two" or this could be one of N API Gateways, even given the same domain name. Our domain name that we're going to be using is api.qcon.symphonia.io and we've got another domain name for the WebSockets side because we need two separate API Gateways to handle that.

I say, "Connect me to api.qcon.symphonia.io," I'm sitting here in London, Route 53 will point me to the API Gateways that are in the London region. We've got some Lambda functions that handle the various WebSocket connections, and messages, and whatnot. We've also got a Lambda function for a health check and one to provide us the history of recent messages.

Those messages are going to get written to this DynamoDB table, this is a part of a global DynamoDB table, so those messages are all going to get replicated over to us-west-2. If I go back to the beginning then, if I'm a user sitting in the United States, or on the West Coast of the United States for sure, when I ask for api.qcon.symphonia.io, it's going to route me to the API Gateways in us-west-2. I'm going to interact with those API Gateways, I'm going to write my messages, those messages are going to get replicated. No matter where I'm sitting, the data is going to be there in that region. We'll talk a little bit more about the idiosyncrasies of that bidirectional replication. No matter where I'm sitting, that data is going to be there. If one of those regions goes down, I've set up a health check, this is how we're going to simulate a regional AWS failure. I hope you didn’t get your hopes up that AWS was actually going to take a region for me. We might get lucky, but I hope not.

We've got a health check there, so what we'll do is we'll basically kill the health check, simulate that region going down, and see that Route 53 is handling, redirecting all of our users to whatever region is surviving in this passing health check. What we'll see is, here in London, I'm going to then get routed to us-west, all my data is going to be there, I can still interact with that, I can write new messages into the system. When I fail back over, it's all back up and available.

I'm going to alter the eu-west-2 health check, we're going to return an HTTP error status. Unfortunately, Route 53 health checks don't support running health checks against WebSocket endpoints, so we could either develop some of that logic ourselves or we can just plug the WebSocket, linking the WebSocket API Gateway with the health check from the REST API Gateway. We're going to observe our requests getting around it appropriately and we're going to observe that the DynamoDB rights are propagated to and from.

These are my health checks on the Route 53 side, my intention was to deploy this, using CloudFront, out to everybody so we could all play with the chat on the big screen, but I'm not sure I trust you all so I didn't do that, maybe some other time. If people download the GitHub repo, you can run this locally and it will attach to the actual API. We might have somebody jump in here at some point, be good, watch your neighbors.

I also had this grand vision of building this logo that had the green Q, and then, looked like the Slack logo from there, and calling it Quack. This is using WebSockets, we're connecting back to that API. If I say, "Hello, London.", I've got two instances of this little web app running. Again, we're connecting back to this API running in eu-west-2. The messages coming back are either being queried out of a DynamoDB table when the application starts up to grab the last bit of history, so you could populate that history if you wanted to. They're actually being read off of the Kinesis stream that's receiving notifications from updates to the regional DynamoDB tables. What I've added to the beginning is the regional indicator there, it’s just what region that Lambda that's servicing that request is running in so we know where this data is coming from. You can go check out the GitHub repo if you need to validate that.

This data is coming from eu-west-2, I'm going to jump over to the DynamoDB console tables. This is the London region, I have this Serverless application messages table, I've got my message in there, "Hello, London." I'm going to just jump over to the Oregon, and just show that this message has been replicated to the Oregon region. It's taking forever because that's pretty much on the other side of the planet. I'm in the Oregon region now and I have that same data there. Great, our bidirectional replication, I've proven unidirectional, a one one-way replication here, but we'll see it both ways in a minute.

What I'm going to do is I'm going to just quickly change our health check code. We're going to return a 500 error for our health check now. I'm going to just show you all of my Bash history. I'm going to deploy that to eu-west-2, that's going to take a minute. I was playing around, this was my opportunity to play a little bit with TypeScript on the back-end. If you want to see a packed together example of Lambda using TypeScript and Webpack for a back-end thing, this is a good place to look.

Also, the slides are loaded into this Git repository, so that's a place for you to find the slides. This is also using the Serverless Application Model.

What I'm going to do now is flip over to our health check, in Route 53, and I may need to kill a little bit more time for this thing to pick up. This usually picks up in a minute so, but the state I'm in now is I've caused the health check to fail. What we're doing, what we're simulating is basically me unplugging eu-west-2. We see that that health check has now flipped over to unhealthy. What Route 53 is doing now, when I refresh my web app, it's going to respond to that request for api.qcon.symphonia.io and, instead of giving me the IP address for the endpoint, here at eu-west-2, it's going to give me the IP for the endpoint in us-west-2.

I'm going to just refresh this one, it should load some history. You can see now, so that history data is actually now coming from us-west -2. That data was replicated all the way over to us-west-2, all I've done here is refreshed the page. Actually, if I was a cleverer front-end programmer, I could handle this automatically behind the scenes, have it reopen the WebSocket connection and get pointed back to the thing, but I'm not that clever for a front-end programmer. If somebody would like to open a PR on that repository, I'd totally welcome that.

I'll just say, "Hello, Portland." That's gone into the database. All of this is getting routed now to us-west-2. I was telling Nicki [Watts], last night, "This demo is actually works, it's boring," the data just goes where we want it to go and the whole thing works, I feel like we're getting our money's worth for these managed services.

What I can do, at this point, I'm going to fix the health check again. We're going to go back to running here in eu-west-2, here in London. I'm fixing that health check. This is just Route 53 is going to see, "Oh, that came back up online. This user who's accessing the system is closer, by latency, to London than to us-west, so I'm going to route to London instead." We're going to see that our data that we added, when we were connected to the Portland region, or to the Oregon region, is still there. Like I said, it's actually boring, it's just going to work.

What you have here is the skeleton of an interesting globally-available application. We could actually set this up across more than two regions. I didn't do that today but you can set this up, I think, pretty much with as many regions as you want to. The simpler version of this demo, I gave across three regions. What you're getting there is not only resiliency, so now you can survive the loss of one or two AWS regions, but you're getting better performance, a better experience for your users, because they're taking advantage of these services that are close to them.

Hopefully, this health check will come back to "Healthy" pretty shortly. While that's happening, some things I didn't do for this demo, I didn't deploy the chat application but there's some new capabilities in CloudFront, so the CloudFront is AWS's global CDN. There's some new capability there for doing failover within CloudFront origins. Not only can we deploy our API to many regions and have it survive those regional failures, we could actually point our CloudFront CDN at S3 buckets in multiple regions as well and have it failover from one to the other if that S3 origin is not available anymore. I give another talk on CDN, so if anybody wants to talk CDN, and CloudFront especially, come find me afterwards. We can nerd out about that.

Another thing that was just announced recently was the AWS Global Accelerator, so this is something we have just announced at Reinvent. It's essentially static IPs, they do some routing magic there. I'm not a networking expert so I can't speak to exactly what they're doing, but you can point those at load balancers and basically do this same regional failover but with ELBs, elastic load balancers, or application load balancers, or just point them to Elastic IPs that you have out there in AWS as well.

We're still showing eu-west as unhealthy, I'm not going to dwell too much on that. Let me take a question.

Participant 1: I understood that WebSockets applications were a difficult combination with Serverless because it's a stateful model, it's like a long-lived connection. Is there anything you had to do to enable that with AWS Lambda?

Chapin: That's actually one of the reasons I used WebSockets for this is because we just got support in API Gateway for WebSockets. What you basically do is you set up lambdas to handle the different events. The connection is stateful but it's still an event-based system, and so, you have a connection event. I have a lambda that says, "Ok, a new client is connected." We have this connections table, we're keeping track of the clients that are connected. The reason we do that is so that, in other lambdas, we can send messages back to those clients using whatever their ideas. There's that connection event, there's a send event, which basically you can set that up to trigger when a message matching a certain pattern comes through. That lambda function is then invoked with the contents of that message that came through on the WebSocket. That WebSocket's connected to API Gateway, and then, API gateway is invoking lambdas as necessary. We can do it in the same stateless way we're used to, API Gateway is a bridge there that lets us use this really easily with lambdas. You could do it before with the IoT service but a little bit more painful, this is this is super easy.

Let's see if this came back, it’s showing "Healthy" now. I'm still connected to us-west-2 but, if i refresh this, I'm now loading from eu-west-2 all of that data. Those guys were able to connect to this, depending on when they hit it, they might have gone to us-west or eu-west, doesn't matter, the data is there. That's basically the extent of the demo, please take this and play with it. I found it super interesting to work on, I was up late last night fiddling with last bits of this.

I'm going to jump back to the presentation now. I'll leave this up probably for today or until somebody really abuses it badly, feel free to play with that. Let me go back to the slides here.

Rough Edges

What are some of the rough edges of this? The thing I like about being an AWS consultant and an employee is I can talk candidly about some of the rough edges. One of the big ones for me, global tables are not available in CloudFormation. CloudFormation is inherently a regional service so I understand that there's a little bit of a disconnect with how they might do that. I like infrastructure as code, I would like to be able to tick a box and say, "Well, if this table has the same name as a table in another region, combine those into a global table by default."

A little bit of a note for people doing WebSockets with API Gateway. If you're using custom domains, much to my chagrin about halfway through building this demo, I discovered that there's no support and confirmation for custom domains and WebSocket API Gateways, so that was a pain. One of the biggest caveats of global tables in general is that all the tables have to be empty when you join them into a global table. You can't have any data in those tables. That's a huge caveat, if you have an application now that you're really happy with and you say, "You know what? I'd really like to add this global high availability. I'm going to use DynamoDB global tables," you can't just tick a box and do it, you actually have to do an entire data migration. For anybody who's done a data migration, especially of terabytes of data on DynamoDB, it's a huge pain. You end up writing an EMR job or something, you're going through all this rigmarole.

Then this last bit down here, SAM is not compatible with CloudFormation StackSets. StackSets are a way to deploy the same CloudFormation stack, the same infrastructure as code template to multiple AWS regions in one shot. I was thinking to myself when I was building this demo, I was "Oh, that would be great. I can use the stackset and just blast the same architecture, the same infrastructure out to as many regions as I want. It'll be super easy," but I also really like using the Serverless Application Model. Serverless Application Model is a transform on top of CloudFormation, it's actually not compatible with StackSets so I wasn't able to do that, which is why you see me with a deploy script pointing at specific regions.

Additional Approaches

I talked a little bit about these additional approaches. Something I actually do like, StackSets don't support SAM but you can now do multi-region deployment via CodePipeline. CodePipeline is AWS's Jenkins as a service thing. What you can do now, and what my partner Mike put together a lovely demo and GitHub repo of deploying an application stack to multiple regions from one pipeline. Super easy.

We talked about origin failover and Global Accelerator. I think we've got a containers-oriented talk later in the track, and so, something like Global Accelerator would work really well for that.

Here's a bunch of the AWS resources, the James Hamilton's talk. If you don't do anything, take any action based on this talk but one, watch James Hamilton's talk, it's so fascinating. Then, a very close second is Rick Houlihan, he's on the DynamoDB team, he gave a talk at "re:Invent" called "Advanced Design Patterns for DynamoDB." That's super interesting just on its own, but he also talks about how they built the DynamoDB global tables and how they've built the replication system. That's another bit of interesting background.

Then, there's a few cases of prior or current art. AWS has been putting out some examples of various bits and pieces. I think I'm the only one that's pulled it all together yet but there's some other information out there and I definitely was using some of these resources to work through some of these problems. Talking about multi-region Serverless, talking about global tables, and talking about WebSocket APIs.

We have Symphonia, we have a bunch of other resources out there, but the one resource I want to leave you with is myself actually. I would really like to hear from people, not only what they thought about this talk, but if you have questions about Serverless architecture or just want to bounce an idea off of somebody, please get in touch. Here's a few ways to get in touch with me, I always love hearing from folks and I'm happy to help where I can.

Questions and Answers

Participant 2: I don't work in Serverless, so this may be a naive question, but your opening point was, under Serverless, you don't need to manage hosts, but then, in the presentation, you're managing [inaudible 00:42:31] directly instead, in terms of [inaudible 00:42:33] and deployment to each region. It struck me that, the model ought to be that you don't need to necessarily worry about regions and use Route 53 to route to a particular region or worry about whether or not the data's got into a particular instance of DynamoDB, it goes into the cloud, and then, the user gets the relevant amount of performance. Do you see that as a future direction for Serverless in terms you don't need to worry about these lower-level details in terms of which host, which database, where the data is going to and it just happens for you?

Chapin: I'm going to rephrase your question just a little bit to make sure I understand it and to make sure everybody else can hear it as well. I think the point you're making is that when we're taking these steps towards Serverless, we're giving up having to manage hosts and processes and whatnot and, are we not starting to see now that maybe we shouldn't even be worried about managing regions or even really moving much further down the path of we just interact with the servers globally and it just works.

I think that's certainly a direction that would be interesting because really, I mean, what I want to do is just I want to build my business logic and make my customers happy. Really, what's going on behind that, they don't care about and, to the extent I can get what I need out of whatever service I'm using, I don't care either. That being said, what I was trying to talk about here was that actually we have to manage those interactions through architecture and through the infrastructure that we're deploying. We're not in that place yet. We do still need to be cognizant of it. Maybe in the future we'll get to a place where some of this behavior is more ubiquitous, we don't have to set it up specifically.

See more presentations with transcripts

Recorded at:

Aug 03, 2019

John Chapin

InfoQ Software Architects' Newsletter