BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Podcasts Chris Swan on DevOps and NoOps, Plus Operations and Code Validation in a Serverless Environment

Chris Swan on DevOps and NoOps, Plus Operations and Code Validation in a Serverless Environment

Bookmarks

On this week’s podcast, Wes Reisz talks with Chris Swan. Chris is the CTO for the global delivery organisation at DXC Technology. Chris is well versed in DevOps, Infrastructure, Culture, and what it means to put all these together. Today’s topics include both DevOps and NoOps, and what Chris calls LessOps, what Operations means in a world of Serverless, where he sees Configuration Management, Provisioning, Monitoring and Logging heading. The podcast then wraps talking about where he sees validating code in a serverless deployment, such as canaries and blue-green deployments.

Key Takeaways

  • Serverless still requires  ops - even if the ops aren’t focused on the technology
  • Even with minimal functions, the amount of configuration may exceed it by a factor of three
  • Disruptive services often move the decimal point
  • ML is the ability to make the inferences and AI is the ability to make decisions based on those inferences

Show Notes

Your bio says you are an IT mixologist, cloud gazer and security guy. What is an IT mixologist?

  • 02:25 Many years ago I had a colleague in New York doing a (drinks) mixology course.
  • 02:35 A mixologist needs to understand what and how drinks can be mixed together - and it’s similar for IT; which parts you’re going to be able to put together.

What was the genesis of the blog post?

  • 03:10 It came from a conversation with Paul Johnson - now at AWS but was a CTO of a startup called Movio.
  • 03:20 I don’t think he realised that he was one of the first people doing serverless at the time, but he was starting to hit blockers around testability and things he was trying to do in the release cycle.
  • 03:35 That raised the question of how to do canary testing in a serverless environment?
  • 03:45 At the time we were talking about this, there wasn’t a good way to do this.
  • 03:50 There have been some changes announced since Lambda, around traffic shifting - which can be used to do blue-green testing.
  • 04:00 There are all the practices around infrastructure as a service, but not so much for serverless.

What is NoOps?

  • 04:35 There are a bunch of people who think that it can be a thing - that you can write some code, throw it onto a platform, which magically takes care of it.
  • 04:55 At the present time it is a fantasy.
  • 05:00 There’s a whole bunch of stuff you’re still responsible for; user expectations of the service are still needed.
  • 05:15 Just because we no longer have servers in a serverless environment, it doesn’t mean that the operations have gone away.
  • 05:30 All of the operations that have nothing to do with servers are still there.
  • 05:35 What seems to be crystallising in a serverless environment in a purely lines of code perspective; you have lines that are functionally running on your serverless platform, and lines of configuration that is setting up the API gateways, security groups.
  • 06:00 Gareth Rushgrove from Puppet has done some digging around, and there’s an approximate 3-1 ratio of configuration to code.

“You can write a function in python, but you need hundreds of lines of configuration code”

  • 06:30 I recently read something: “I don’t see what all the fuss about serverless is; it’s all just glue” - but it’s space-filling glue.
  • 06:40 We’re seeing examples of where you can make an entire thing out of nothing but space-filling glue.
  • 06:55 You might be making trade-offs to do this, but in return you might be able to scale without some of the limitations we’ve had in the past.
  • 07:25 The promise is that you can start small and grow big, without too many constraints about the team and scale of software engineering and operations department.

How do you define operations?

  • 08:00 Charity Majors defined operations as the constellation of your organisational technical skills, practices, and cultural values around designing, building and maintaining systems, shipping software and solving problems with technology.
  • 08:15 Operations is the set of cultural artefacts that do all of the things besides writing software that makes your application or services a real thing.
  • 08:40 I normally look at the activities we perform in operations.
  • 08:45 We get software where we want it to run.
  • 08:50 We configure software so that it runs in the right way.
  • 08:55 We make sure that the thing we’re running is running in the way we’re expecting it to.
  • 09:05 As we move towards micro-services, we’re moving away from monitoring and more into an observability paradigm.
  • 09:15 Then we have logging, which is generating log events, and to allow audit and forensic reasons - or to do diagnostics.
  • 09:35 We can take all of these things and ask whether we can still do this as technologies change - whether that’s containers, or serverless - and ask whether we’re still going to do (for example) provisioning?
  • 05:05 In the case of lambda, yes we are.
  • 10:00 You take the JavaScript function, with dependencies, and copy into an S3 bucket, and point your service at that bucket - you’ve done provisioning.
  • 10:10 Do you have to do configuration? Well, absolutely - API gateways, security roles, access policies.
  • 10:25 Monitoring is a little bit more built-in to the platform (same for logging).
  • 10:40 All of these things present themselves as a service.
  • 10:50 Provisioning can evolve to configuring a provisioning service.
  • 10:55 Monitoring could be replaced with a service that is configured to look for monitoring events.
  • 11:05 When you squeeze at that end, everything becomes configuration management.
  • 11:20 This brings you back to NoOps, since it brings you back to code; but 3/4 of the code is operations related configuration, not functional code.

When we started with DevOps there were paradigm shifts?

  • 12:10 Simon Wardley says that stuff evolves from its genesis through its product phase to a utility phase.
  • 12:25 Operations co-evolve with those things.
  • 13:30 Virtualisation evolved with mainframes decades ago, and then we had virtualisation as a product with x86 and VMWare, and lastly Amazon came along and made virtualisation a service.
  • 12:50 We’ve now got virtualisation as a service, and operations have evolved with it and have now called it DevOps.
  • 13:05 The Gene Kim view of DevOps is borrowing from lean manufacturing of flow, feedback, and continuous learning through experimentation.
  • 13:40 Serverless is a new paradigm, and we can expect to see the growth of new operational practices.
  • 13:55 The Gene Kim view stays valid throughout - it’s still going to be equally relevant with serverless.
  • 14:10 It all comes down to labels - we spend a lot of time arguing about labels in this industry.
  • 14:15 Serverless is a silly label because there are still servers; people are still arguing about what the DevOps label means.

In a serverless environment, how do you handle canaries and blue-green deployments?

  • 15:00 There are definitions of canaries and blue-green deployments on the Thoughtworks blog [https://www.thoughtworks.com/insights/blog/implementing-blue-green-deployments-aws].
  • 15:15 With a blue-green deployment, you have two versions of your code and can swing traffic from one to the other.
  • 15:25 Canarying is much more finely grained blue-green deployment, where you can put a finite proportion of your traffic to the newer versions, and pay attention to what is happening.
  • 15:35 If all looks like it is going well, you can ramp the traffic up - and if it isn’t, you can ramp it down again.
  • 16:05 You can also be focused about this, by only allowing traffic to go to the newer deployments to friends and family or advance beta testers.
  • 16:35 We’ve seen great case studies from Netflix - Roy Rapoport gave a great presentation at QCon New York a few years ago. 
  • 16:45 Canarying has become an established part of how state-of-the-art organisations do robust deployment in an infrastructure as a service world.
  • 17:00 Netflix has been able to reasonably drop containers in place of VMs relatively easily using these techniques.
  • 17:15 What about load balancers? Well, they’re still there but it’s behind the magic curtain.
  • 17:25 Until there were some recent announcements from Amazon there was no way of controlling that.
  • 17:30 Amazon has now said that they will allow the load balancers to do traffic shifting to support blue-green deployments.
  • 17:35 There’s a robust set of operational practices that have been de rigueur in infrastructure as a service based organisations, and when serverless came along it wasn’t able to support that.

How does Amazon’s load balancers allow you to shift traffic?

  • 18:05 What it’s doing is giving control over the Amazon load balancers, so that when you release a new lambda function it gives you control over how much traffic to route to it.
  • 18:20 The details (like whether you can enable it for friends-and-family) aren’t clear yet - the documentation hasn’t yet been made available.
  • 18:35 I think if it follows the characteristic evolution of these services, it will start out as a blunt instrument that will allow a metric to be forwarded from one to another.
  • 18:45 Over time I expect this configuration to grow to support more complex requirements.

What are some of the other cultural shifts towards serverless environments?

  • 19:05 My definition of cultural is “the way we do things around here”.
  • 19:10 It’s causing those same organisations to change their culture.
  • 19:25 A year and a half ago, at the AWS summit in London in summer 2016, there were two case studies of companies that had effectively the same product.
  • 19:40 One was Monzo, that had a pre-paid debit card for spending in different currencies.
  • 19:50 They were talking about the Financial Conduct Authority and how they were concerned about a start-up tech firm using AWS.
  • 20:00 The architecture for what they were doing wasn’t really the topic.
  • 20:10 The other was Travelex, who had released almost exactly the same product, a debit card that was optimised for cheap FX transactions.
  • 20:25 They needed a bunch of glue to connect their internal systems with a set of external vendor services.
  • 20:30 The case study was that they’d been able to get a team of three people, thread together some functions, and deliver it really quickly and cheaply.
  • 20:50 Not only was it really fast to deliver and turn around, but the cost on AWS was moving the decimal point a few times.
  • 21:05 If you can move a decimal point, that’s probably disruptive.
  • 21:20 That changed the culture of a small part of that organisation - but you could see them salivating over the speed to market and cost efficiency and wondering what else they could do with this.
  • 21:45 It’s very early in the adoption curve, so there’s things like that which are slam dunks.
  • 21:55 There’s also going to be a lot of stuff that isn’t going to fit.
  • 22:00 We have to remember that there are still mainframes around, AS/400s (aka iSeries) and VAXes, and all of that hasn’t gone away.
  • 22:05 We have all these onion architectures, and serverless is the thinnest of skins on the very outside edge of the onion.
  • 22:20 It’s going to consume all of the applications that it’s an easy fit for, very easily.
  • 22:25 The next phase of this is going to be accelerated by function libraries emerging.
  • 22:40 What we’ll then see happening from an organisational perspective is not necessarily adoption of serverless in the context of traditional software development.
  • 22:55 We’ll see problems already having been solved that they can plug into.

Things like the serverless framework?

  • 23:15 There’s a bunch of frameworks, and the serverless framework is one of them to make starting with serverless pain-free.
  • 23:30 A lot of what they’re doing is Ops grunt work - turning code into more efficient Ops outputs.
  • 23:45 I see the same set of behavioural practices with Kubernetes - it’s driven by a menagerie of YAML files.
  • 24:10 Pointing kubectl to a YAML file can be powerful - it can stand up pods with monitoring or whatever else.
  • 24:20 We’ve got all these components now being shrink-wrapped, so you can start to assemble them.
  • 24:35 If we look back to the days of J2EE and Weblogic, we had both the Pet Store but also samples and examples included with the server.
  • 24:45 I saw much more being built out of the Weblogic samples and examples.
  • 24:55 Today WeWorks created the Sock Shop as the modern day Pet Store and it’s sort of the best of both worlds since it’s no longer a monolith.
  • 25:10 If history repeats itself, I’d expect to see a lot of applications growing out of templates like that.
  • 25:25 With the frameworks, I really meant marketplaces for functional code that does something.
  • 25:40 If I think of something contemporary, there might be some functional code that can add something to the blockchain.
  • 25:45 Why would you write something like that yourself when you could get something from a marketplace instead?

What does the serverless framework give you?

  • 26:30 A lot comes down to the question of boilerplate.
  • 26:40 The thirty lines of JavaScript or Python or Go are still going to have to be written.
  • 26:55 What the framework gives you is how to take care of all of the things that you need to do to get the function out there.
  • 27:00 A lot of it is boilerplate code.
  • 27:35 An evolution I’ve seen subsequently is starting to connect this into a development environment and a CI/CD pipeline.
  • 27:50 Amazon code star will take you all the way from their Cloud 9 IDE through to deploying the function out into lambda, running tests on the way.
  • 28:10 You can start out with “hello world”, and then edit it incrementally and deploy it as you go.

What has the opportunity of moving the decimal point in 2018?

  • 28:45 We’re going to see more machine learning and artificial intelligence.
  • 28:55 The quality of inferences that we’ve seen have gone from pretty bad to now really sharp.
  • 29:10 We now have low false positive rates and low negative rates - they’re starting to become dependable.
  • 29:20 Drawing a line between ML and AI, then ML is the ability to make the inferences and AI is the ability to make decisions based on those inferences.
  • 29:35 We’re getting to the stage where software doesn’t suck - previously, it asked questions where you thought it should know the answer to.
  • 29:45 The developer can now infer these answers with a high enough accuracy that assuming it is often the right thing.
  • 30:05 As an industry, we’re starting to see the early fruits of this.

What do you think is driving the surge of ML and AI?

  • 30:35 One thing has to be the level of investment from the hyper-scale investors.
  • 31:00 There was a paper from Google that said they can use machine learning to build application-specific indexing mechanisms.
  • 31:15 There’s a whole bunch of problem spaces where we’ve used blunt but general purpose tools.
  • 21:35 ML allows us to have a custom-fit tool for each of those application spaces.
  • 31:40 One of the observations about ML is that it allows people to experiment with mathematics that had previously been domain specific.
  • 32:00 For example, a Java developer can be able to drop a LISP engine into an application and have it just work.
  • 32:15 There are all sorts of well-understood mathematical constructs but they haven’t been applied elsewhere.
  • 32:25 They are now being packaged up as ML approaches and shotgun tested against a problem space.
  • 32:40 In the past, the domain experts and mathematicians weren’t going to the same conferences, so the crossover never existed before.
  • 33:00 When I learnt about tensors and thought about what I’d done in avionics and classical control theory, I wondered why we don’t use tensors in avionics.
  • 33:20 There are marketplaces with things like kaggle that are trying out lots of different algorithmic approaches and works for their piece of data.
  • 33:30 There’s a joke - theorists know how things should work, and practicioners know how things do work, but software engineers don’t know how things work and don’t know why.
  • 33:50 A lot of these tools provide us with new ways of not knowing how things are working.
  • 33:55 That toes into ethical concerns.
  • 34:00 If we look at GDPR we may have difficulty in explaining how we arrived at an answer.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption
Style

BT