InfoQ Homepage Articles The Future of Serverless Compute

DevOps

The Future of Serverless Compute

Leia em Português

Feb 23, 2017 21 min read

Mike Roberts

reviewed by

Manuel Pais

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

Serverless compute, or Functions-as-a-Service (FaaS), will be transformational in our industry - organizations that embrace it will have a competitive advantage because of the cost, labor and time-to-market advantages it brings
Many Serverless applications will combine FaaS functions with significant use of other vendor-provided services that provide pre-managed functionality and state management
Tooling will significantly improve, especially in the area of deployment and configuration
Patterns of good Serverless architecture will emerge - it's too soon to know what they are now
Organizations will need to embrace the ideas of 'true' DevOps and autonomous, self-sufficient, product teams to reap the full time-to-market benefits that Serverless can offer

It’s 2017 and the Serverless compute revolution is a little over two years old (do you hear the people sing?). This revolution is not coming with a bang, like Docker did, but with a steady swell. Amazon Web Services put out new Lambda features and products on a regular cadence, the other big vendors are releasing production-ready versions of their offerings one-by-one, and new open source projects are frequently joining the party.

As we approach the end of the early-adopter period, it’s an interesting exercise to put on our prediction goggles and contemplate where this movement is going next, how it’s getting there, and most importantly what changes we need from our organizations to support it. So, join me as we look at one possible future of Serverless compute.

Note to readers from the actual future! You’ll probably get a good kick out reading this. How far off was I? And how is 2020? Please send me a postcard!

A vision of Serverless capabilities

Compute

The last decade has seen the emergence, and then the meteoric rise, of cloud computing. Nine years ago, virtual public cloud servers were ‘toys’ for startups but in a relatively short time have become the de facto platform for large swaths of our industry as it considers any new deployment architecture.

Serverless compute, or Functions-as-a-Service (FaaS), is a more recent part of this massive change in how we consider ‘IT’. It is the natural evolution of our continuing desire to remove all baggage and infrastructural inventory from how we deliver applications to our customers.

A huge number of applications we develop consist of many small pieces of behavior. Each of those are given a small input set and informational context, will do some work for a few 10s or 100s of milliseconds, and finally may respond with a result and/or update the world around them. This is the sweet spot of Serverless compute.

We predict that many teams will embrace FaaS due to how easy, fast and cheap it makes deploying, managing and scaling the infrastructure necessary for this type of logic. We can structure our use of FaaS into various forms, including:

Complete back-end data pipelines consisting of a multitude of sequenced message processors
Synchronous services fronted by an HTTP API
Independent pieces of ‘glue’ code to provide custom operations logic for deployment, monitoring, etc.
Hybrid-systems consisting of traditional ‘always on’ servers directly invoking platform-scaled functions for more computationally intensive tasks.

Businesses that embrace FaaS will have a competitive advantage over those that don't because of the cost and time-to-market benefits it brings.

Managing Application State

One of the prerequisites of such a large adoption of FaaS is a solution, or set of solutions, for fast and simple approaches to state management. Serverless compute is a stateless paradigm. We cannot assume that any useful state exists in the immediate execution environment of our running functions between separate invocations. Some applications are fine with this restriction as it stands. For example, message-driven components that are purely transformational need no access to external state, and web service components that have liberal response-time requirements may be ok to connect to a remote database on each invocation. But for other applications this is insufficient.

One idea to solve this is a hybrid architecture that manages state in a different type of component than that executing our FaaS code. The most popular such hybrid is to front FaaS functions with other services provided by the cloud infrastructure. We already see this with context-specific components like API Gateway which provides HTTP routing, authorization, and throttling logic that we might typically see programmed in a typical web service, but defined instead through configuration. Amazon have also recently shown their hand in a more generic approach to state management with their Step Functions service, allowing teams to define applications in terms of configured state machines. The Step Functions service itself might not become a winner, but these kinds of codeless solutions are going to become very popular in general.

Where vendor services are insufficient, an alternative approach to a hybrid system is for teams to continue to develop long-lived components that track state. Those might be deployed within a CaaS (Containers-as-a-Service) or PaaS (Platform-as-a-Service) environment, and will work in concert with FaaS functions.

These hybrid systems combine logic in long-running components and per-request FaaS functions. An alternative is to focus all logic in FaaS functions, but to give those FaaS functions extremely fast retrieval and persistence of state beyond their immediate environment. A possible implementation of this would be to make sure that a particular FaaS function, or set of FaaS functions, have very low latency access to an external cache, like Redis. This could be provided by enabling a feature similar to Amazon’s same-zone placement groups. While such a solution would still incur more latency than memory - or disk-local state - many applications will find this solution acceptable.

The benefits of the hybrid approach are that frequently accessed state can stay in-environment with the logic using it, and that no complicated, and possibly expensive, network co-location of logic and external state are required. On the other hand, the benefits of a pure-FaaS approach are a more consistent programming model, plus a broader use of the scaling and operational benefits that Serverless brings. The current momentum suggests that the hybrid approach will win out, but we should keep our eyes open for placement group-enabled Lambdas, and the like.

Serverless collaboration services

Beyond orchestration and state management, we see the commoditization and service-ification of other components that traditionally we would develop or at least manage ourselves even within a cloud environment. For instance we may stop running our own MySQL database server on EC2 machines, and instead use Amazon’s RDS service, we may replace our self-managed Kafka message bus installation with Kinesis. Other infrastructural services include file systems and data warehouses while more application-oriented examples include authentication and speech analysis.

This trend will continue, reducing still further the amount of work we need to do to create or maintain our products. We can imagine more pre-built messaging logic (think of a kind of ‘Apache Camel as a Service’ built into Amazon Kinesis or SQS), and also further developments in generic machine learning services.

A fun idea here is that FaaS functions, due to their lightweight application model, can themselves be tightly bound to a service leading to ecosystems of FaaS functions calling services that themselves call other FaaS functions, and so on. This leads to ‘interesting’ problems with cascading errors, for which we need better monitoring tools, as discussed later in this article.

Beyond the data center

The vast majority of Serverless compute so far is on vendor platforms running in their data centers. It gives you an alternative to how you run your code but not where you run your code. An interesting new development from Amazon is to allow their customers to run Lambda functions in different locations, for instance in a CDN with Lambda@Edge, and even non-server locations, e.g. IoT devices with Greengrass. The reason for this is that Lambda is an extremely lightweight programming model that is inherently event driven and so it’s easy to use the same intellectual ideas and style of code in new locations. Lambda@Edge is a particularly interesting example since it provides an option for programmed customization in a location that never had it before.

Of course, a drawback to this is even deeper vendor lock-in! For those organizations that don’t want to use a 3rd party vendor but do want many of the benefits of Serverless compute they will be able to do this with an on-premise solution, just like Cloud Foundry has done for PaaS. Galactic Fog, IronFunctions and Fission, from Kubernetes, are early efforts in this area.

The tools and techniques we’ll need

As I wrote previously there are significant speed bumps, limitations and tradeoffs when using a Serverless approach. This is no free lunch. For the Serverless user base to grow beyond early adopters we need to fix or mitigate these concerns. Fortunately, there is good momentum in this area.

Deployment tooling

Deploying functions to Lambda using AWS’ standard tools can be complex and error-prone. Add in the use of API Gateway for Lambda functions that respond to HTTP requests and you have even more work to do for setup and configuration. The Serverless and ClaudiaJS open source projects have been pushing on deployment improvements for over a year, and AWS joined the party with SAM late in 2016. All these projects simplify the creation, configuration and deployment of Serverless applications by adding considerable automation on top of AWS’ standard tooling. But there is still plenty to do here. In the future two key activities will be heavily automated:

Initial creation of an application and/or environment (e.g. both initial production environment, and temporary testing environments)
Continuous Delivery / Deployment of multi-component applications

The first of these is important in order to more widely enable the ‘conception-to-production lead time’ advances that we’ve started seeing. Deploying a new Serverless application needs to be as easy as creating a new Github repo - fill a small number of fields, press a button, and have some system create everything you need to allow one-click deployment.

However, easy initial deployment is not sufficient. We also need good tools to support Continuous Delivery and Continuous Deployment of the type of hybrid application I mentioned earlier. This means we should be able to deploy a suite of Compute functions and CaaS / PaaS components, together with changes to any application-encapsulated services (e.g. configured http routes in an API Gateway, or a Dynamo table only used by a single ‘application’), in one click with zero downtime and trivial rollback capability. And furthermore, none of this should be intellectually complex to understand, nor need days of work to setup and maintain.

This is a tough call, but the tools I mentioned previously (together with hybrid tools like Terraform) are leading the way to solving these problems, and I fully expect them to be largely solved over the coming months and years.

This topic isn’t just about deploying code and configuring services, however. Various other operational concerns are involved. Security is a big one. Right now, getting your AWS credentials, roles and the like set up and maintained can be a hassle. AWS have a thorough security model, but we need tools to make it more developer-friendly.

In short, we need Developer UX as good as Auth0 have with their Webtask product, but for an ecosystem as vast (and as valuable) as AWS.

Monitoring, Logging and Debugging

Once our application is deployed we also need good solutions for monitoring and logging, and such tools are under active development right now by several organizations. Beyond assessing the behavior of just one component though, we also need good tools for tracing requests through an entire distributed system of multiple Serverless compute functions and supporting services. Amazon are starting to push in this area with X-Ray, but it’s very early days.

Debugging is also important. Programmers have rarely written code correctly for every scenario on first pass before now, and there’s no reason to believe that’s going to change. We rely on monitoring to assess problems in FaaS functions at development time, but that’s a stone-age tool of debugging.

When debugging traditional applications, we get a lot of support from IDEs in order to set breakpoints, step through code, etc. With modern Java-based IDEs you can attach to a remote process that’s already running, and perform these debugging operations at a distance. Since we will likely be doing a lot of development using cloud-deployed FaaS functions, expect in the future that your IDE will have similar behavior to connect to a running Serverless platform and interrogate the execution of individual functions. This will need collaboration from both tool and platform vendors, but it’s necessary if Serverless is going to gain widespread adoption. This does imply an amount of cloud-based development, but we’re likely going to need that anyway for testing...

Testing

Of all the Serverless tooling topics I’ve considered so far, the one that I think is least advanced is testing. It’s worth pointing out that Serverless does have some pretty hefty testing advantages over traditional solutions in that (a) with Serverless compute individual functions are ripe for unit testing and (b) with Serverless Services you have less code to write and therefore just simply less to test, at the unit level at least.

But this doesn’t solve the cross-component functional / integration / acceptance / ‘journey’ test problem. With Serverless compute our logic is spread out across a number of functions and services and so higher-level testing is even more important than with components using something closer to a monolithic approach. But how do we do this when we’re relying so much on execution on cloud infrastructure?

This is probably the most misty prediction for me. I suspect that what will happen is that cloud-based testing will become prevalent. This will occur partly because it will be much easier to deploy, monitor, and debug our Serverless apps than it is right now for the reasons I just described.

In other words, to run higher level tests we’ll deploy a portion of our ecosystem to the cloud and execute tests against components deployed there, rather than running against a system deployed on our own development machines. This has certain benefits:

the fidelity of executing cloud-deployed components is much higher than a local simulation.
we’ll be able to run higher-load / more-data-rich tests than we might otherwise.
testing components against production data sources (e.g. a pub-sub message bus, or a database) is much easier, although obviously we’ll need to be careful of capacity / security concerns.

This solution also has drawbacks though. First, cycle time to execute tests will likely increase due to both deployment concerns, and network latency between the test - which will still run locally - and the remotely-executing system-under-test. Second, we can’t run tests when disconnected from the internet (on a plane, etc.) Finally, since production and test environments will be so similarly deployed, we’ll also need to be very careful about not accidentally changing production when we meant to change test. If using AWS such safety may be implemented through tools like IAM roles, or using entirely different accounts for different types of environment.

Tests are not just about a binary fail-succeed - we also want to know how a test has failed. We should be able to debug the locally-running tests and the remote components they are testing, including being able to single-step a Lambda function running in AWS as it is responding to a test. And so all of the remote debugging, etc., tools I mentioned in the previous section will be needed for testing too, not just interactive development.

Note that I’m not implying from this that our development tools need to run in the cloud, nor that the tests themselves have to run in the cloud, although both of these will occur to a greater or lesser extent. I’m merely expressing that the system-under-test will only ever run in the cloud, rather than on a non-cloud environment.

Using Serverless as a test driving environment though can reap interestingly useful results. One example is ‘serverless-artillery’ - a load testing tool made up of running many AWS Lambdas in parallel to perform instant, cheap, and easy performance testing at scale.

It’s worth pointing out that we may, to some extent, dodge a bullet here. Traditional higher-level testing is actually becoming less important due to advances in techniques where we (a) test in production / use Monitoring-Driven-Development, (b) significantly reduce our mean-time-to-resolution (MTTR) and (c) embrace a Continuous Deployment mantra. For many Serverless apps extensive unit testing, extensive business-metric level production monitoring & alerting, and a dedicated approach to reducing MTTR and embracing Continuous Deployment will be a sufficient code validation strategy.

Architecture: many questions to answer

What does a well-formed Serverless Application look like? How does it evolve?

We’re seeing an increasing number of case studies of architectures where Serverless is being used effectively, but we haven’t yet seen something like a ‘pattern grouping’ for Serverless Apps. In the early 2000s we saw books like Fowler’s Patterns Of Enterprise Application Architecture, and Hohpe / Woolf’s Enterprise Integration Patterns. These books looked at a whole collection of projects and derived common architectural ideas useful across different domains.

Importantly these books looked to several elapsed years of experience of the underlying tools before making any unifying opinions. Serverless hasn’t even existed long enough as a technology to warrant such a book, but it’s getting closer and within a year or so we’ll start seeing some common, useful practices emerge (anyone that uses the term ‘best practice’ today when it comes to Serverless architecture needs be given a significant raised-eyebrow look).

Beyond application architecture (how serverless apps are built), we need to think of deployment architecture too (how serverless apps are operated). I already talked about deployment tools, but how do we use those tools? For instance:

What do terms like environments mean in this world? ‘Production’ seems less clear-cut than it used to be.
What does a side-by-side deployment of a stack and slowly moving traffic from one set of functions/service versions to a different set of functions/service versions (rolling deployment) look like?
Is there even such a thing as "blue-green" deployment in this world?
What does roll-back look like now?
How do we manage upgrading / rolling-back databases and other stateful components when we might have multiple different ‘production’ versions of code running in functions simultaneously?
What does a phoenix-server look like now when it comes to 3rd party services that you cannot burn down and redeploy for cleanliness?

Finally, what are useful migration patterns as we move from one architectural style to something that is, or includes, serverless components? In what ways can our architecture change in an evolutionary way?

Many of these yet-to-be-defined patterns (and anti-patterns) are not obvious, most clearly shown by our very nascent ideas of how best to manage state in Serverless systems. There will no doubt be some surprising and fascinating patterns that emerge.

How our organizations will change

While cost benefits are one of the drivers of Serverless, the most interesting advantage is the reduction of ‘conception-to-production lead time’. Serverless enables this reduction by giving ‘superpowers’ to the vast majority of us engineers who aren’t experts in both systems administration and distributed systems development. Those of us who are ‘merely’ skilled application developers are now able to deploy an entire MVP without having to write a single shell script, scale up a platform instance, or configure an nginx server. Earlier I mentioned that deployment tooling was still a work-in-progress, and so we don’t see this ‘simple MVP’ solution for all types of application right now. However, we do see it for simple web services, and even for other types of apps deploying a few Lambda functions is still often easier than managing operating system processes or containers.

Beyond the MVP we also see cycle-time reductions through the ability to redeploy applications without having to be concerned about chef/puppet-script maintenance, system patch levels, etc.

Serverless gives us the technical means to do this, but that’s not enough to actually realize such improvements in an organization. For that to happen companies need to grapple with, and embrace, the following.

‘True’ DevOps

DevOps has come to mean in many quarters ‘Technical Operations with the addition of techniques more often seen in development.’ While I’m all for increased automation and testing in system administration, that’s a tiny part of what Patrick Debois was thinking when he coined the term DevOps.

True DevOps instead is about a change in mindset, and a change in culture. It’s about having one team, working closely together, to develop and operate a product. It means collaboration rather than a negotiated work queue. It means developers performing support. It means tech ops folk getting involved with application architecture. It means, in other words, a merging of skill and responsibility.

Organizations won’t see the efficiency gains in Serverless if they have separated Development and Ops or ‘DevOps’ teams. If a developer codes an application but then needs someone outside of their immediate group to deploy the system, then their feedback gains are wiped out. If an operations engineer is not involved with the development of an application, they won’t be able to on-the-fly adapt a deployment model.

In other words, in the future the companies that have made the real gains from Serverless will be the ones who have embraced true DevOps.

Policy / access control changes

But even a change in team-by-team culture is not sufficient. Often times, in larger companies an enthusiastic team will come up against the brick wall of Organizational Policy. This might mean a lack of ability to deploy new systems without external approval. It might mean data access restrictions to all but existing applications. It might mean ultra-strict spending controls.

While I’m not advocating companies throw all their security and cost concerns out of the window, to make the most of Serverless they are going to need to adapt their policies to allow teams to change their operational requirements without needing team-external human approval for every single update. Access control policies need to be set up not just for the now, but what might be. Teams need to be given budgetary freedom within a certain allocation. And most of all experiments should be given as much of a free-reign sandbox as possible while still protecting the truly sensitive needs of an organization.

Access control tooling is improving, through use of IAM roles and multiple AWS accounts, as I mentioned earlier. However, it is still not simple, and is ripe for better automation. Similarly, rudimentary budget control exists for Serverless, again mostly through multiple accounts, but we need easier control of per-team execution limits, and of different execution limits for different environments.

The good news is that all of this is possible through advances in access control tooling, and we’ll see more progress in patterns of budget allocation, etc., as Serverless tools continue to advance. In fact, I think automation of access and cost controls will become ‘the new shell scripting’ - in other words when teams think of the operational concerns of their software they won't think of start/stop scripts, patch levels and disk usage - instead they'll think of precisely what data access they'll need and what budget they require. Because teams will be thinking about this so often engineers will automate the heck out of it, just like we did with deployment before.

Given this ability and rigor, in the future, even for the most data-sensitive enterprises, passionately experimental teams will use Serverless technologies to try out ideas that would never have made it past the whiteboard before, and will do so knowing that they are protected from doing any real intellectual or financial damage.

Product ownership

Another shift we’ve seen on many effective engineering teams over the last few years is a change of focus from projects to products. Structurally this often comes via less focus on project roadmaps, iterations and burndown charts, and instead more on a kanban-style process, lightweight estimates and continuous delivery. More importantly than the structural changes though are the shifts in role and mindset to more overlapping responsibilities, similarly to what we see with (true) DevOps.

For instance, it is very likely now that product owners and developers will collaborate closely on the fleshing out of new ideas - developers will prototype something, and product owners may dig into some technical data analysis, before locking down a final production design. And similarly, the spark of innovation - where a new idea or concept comes into someone’s head - could belong to anyone on the team. Many members of the team, not just one, now embrace the idea of customer affinity.

A Serverless approach offers a key benefit to those teams embracing a whole-team product mindset. When anyone on the team can come up with an idea and quickly implement a prototype for it a new mode of innovation is possible. Now Lean Startup-style experimentation becomes the default way of thinking, not something reserved for ‘hack days’, because the cost and time of doing so is massively reduced.

Another way of looking at this is that teams that don’t embrace a whole-team product mindset are likely to miss out on this key benefit. If teams aren’t encouraged to think beyond a project structure, it’s hard for them to make as much use of the accelerated delivery possibilities that Serverless brings.

Conclusion

Serverless is a relatively new concept in software architecture, but is one that is very likely to have an impact as large as other cloud computing innovations. Through technology advances, tooling improvements and shared learning in Serverless application architecture, many engineering teams will have the building blocks they need to accelerate, and even transform, how they do product development. The companies that adopt Serverless, and adapt their culture to support it, are the ones that will lead us into the future.

Acknowledgments

Thanks to the following for their input into this article: John Chapin, Chris Stevenson, Badri Janakiraman, Ben Rady, Ben Kehoe, Nat Pryce.

About the Author

Mike Roberts is an engineering leader and cofounder of Symphonia, a serverless and cloud technology consultancy. Mike is a long-time proponent of Agile and DevOps values and is excited by the role that cloud technologies have played in enabling such values for many high-functioning software teams. He sees serverless as the next technological evolution of cloud systems and as such is optimistic about their ability to help teams be awesome. Mike can be reached at mike@symphonia.io and tweets at @mikebroberts.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?