Bio Jeff Sussna is an independent IT consultant focused on helping companies deliver services through software. His company, Ingineering.IT focuses on DevOps and continuous delivery. You can read his blog at http://blog.ingineering.it/.
The AWS re: Invent conference is the largest gathering of the AWS community. Attendees have the opportunity to learn valuable best practices from sessions delivered by AWS architects and engineers, customers and partners, to gain experience in the Hands-On Labs, and make new connections.
Sure, I’m Jeff Sussna, an independent IT consultant, my focus is on helping companies deliver service through software, so, focused on DevOps and continuous delivery and cloud are really taking a lean approach towards continuously improving their ability to deliver across the entire life cycle.
2. So to start off with, you keep a pretty active blog, you recently wrote about “Production Freezes”, and the dangers of them. Can you describe that point a bit for the audience and what’s the problem space and what do you think is the path forward?
Sure. So, traditionally at important times of the year, companies and IT organizations will freeze their production, infrastructure and applications and the logic behind that it’s kind of intuitively obvious which is it’s the most important time you really don’t want to mess things up and change introduces on certainty, certainty by its nature introduces instability and that makes sense, I’ve done it, I’ve lived it, I’ve advocated it but there’s a little problem with it which is you’re also saying that at the time when it’s most important to respond you have the least ability to respond, so if you take, you know it’s Christmas shopping season now and Amazon is a big retailer and this is the most important time of the year for them, so if you say we’re going to freeze until January fifth, you’re basically saying it’s two some odd months before we can respond to any of our customers need, and by definition when we do it, it will be too late.
So the problem with the idea of a Production Freeze it is it assumes that A: your stuff really does work, when really the truth is you think it works now, as far as you know it works, but tomorrow you can find out it doesn’t and then the question is: what do you do? And even if your stuff does work, you find out the customers want to do something different, again, it’s the most important for you to address those problems and Production Freeze says you can’t. Generally what people do is they say “ok, what we’ll do is we’ll let the most critical changes go through and we’ll give them this unusually rigorous review procedure”. So why do you do that? You do that because you don’t have confidence in your regular release procedure. If you didn’t have a Production Freeze and you said “we’re going to treat every single release that we do every day as if it was in the middle of Christmas”, the by the time you do get to Christmas you’ve practiced that a thousand times and you have confidence in it.
So it’s kind of this need to flip thing on their head and instead of thinking about how do we address our customers from the perspective of IT stability, how do we address IT stability from the perspective of our customers and being able to respond to them.
3. What do you think of the technologies that need to be in place to help comfort that process, so that’s somebody that is used to have a Production Freeze and have a rigorous process, what do they need in place, how does that transition start, to say you try that with some initial apps, what’s that culture change or even what’s the technology for that?
Well the first thing, I think with all of this is you don’t do it in one big shot; you don’t go from releasing code once a quarter to releasing code ten times a day. What you have to do is look for the places where you have waste, the place where you have kind of unnecessary manual steps and start engineering things out. So automation is absolutely critical and you can automate wheatear you release once a quarter once a day. So you can start looking at your infrastructure environments and you can ask questions like “do we treat Dev differently from Prod” and if so, why? Because when developers sit down to test in the development environment, generally speaking, let’s say they’re trying to fix an important bug; having that environment be very different from production doesn’t help them. The developers like to say “Oh I need the sandbox”. Why? Most of the time it’s just going to slow you down. So if you treat development and production the same, you immediately have more confidence that what you’re building and what you’re testing is going to work in production.
So you sort of slowly start to ring the variation and ring the waste and ring the uncertainty out of the confusion and then you start to kind of ratchet things up and you look for excuses. If you take a SCRUM approach which is a classic approach and which you develop and maybe test during the iteration and then you release at the end of the iteration. Well if you have something done in the middle of the iteration, you as yourself the question “why wait?”So you continuously ask yourself those questions “why are we doing this, why are we not doing that?” It’s very lean approach it’s a very continuously improving approach and you kind of start to ring a lack of confidence and fear out of the system.
[Richard’s full question: So you mentioned the difference between production environments and Dev environments are there any differences that should be in place whether it’s procuring them or running them, do you really think that almost the processes for both should be similar, because we’ve seen environments where it takes you a lot of rigor to get Dev when maybe it doesn’t need the same rigor, it doesn’t need to be in the same configuration database, maybe, but do you think those should be treated the same? ]
Well I think with tools like Puppet, Chef, CFEngine, Vagrant there’s less and less excuse. I’m a very big fan of Vagrant. If I can tip up a Vagrant box it contains an environment for an entire multi-tier app and some Chef or Puppet scripts are going to converge the configuration for me and I can hand it out to all of my developers so that every developer can have a complex test environment on their desktop and they don’t have to do any configuration and they don’t have to worry about whether they have the right version of Tomcat running and they can have four versions of it on their desktop. There’s actually less work rather than more work. I’m at the point where one thing that just really drives me crazy is when I see people creating multi-day user stories to do thing like install a dupe on their local machine and I just want to go ”why would you do that?” That is not a good use of your time as a developer. So I think we’re actually at the point where the tools are mature enough that it takes less effort to keep Dev and test and production in sync with the same technologies as it does not to.
5. That’s a good lead in to talking about some of the infrastructures automation tools that developers should know, there’s clearly in DevOps you’ve got some operations doing some things to write those scripts and give the developers environments but what are the things developers should more familiar with from an infrastructure side, if any?
Well, first thing I want to do is challenge the notion that Ops is writing those scripts. The whole issues of what’s Ops and what’s Dev in DevOps, we can do a half an hour on that alone and I think a lot of people are missing a fundamental understanding there. But if you treat infrastructure as code, it’s code and developers understand code and they understand how to write it and how to design it and how to make it reusable so I don’t necessarily see any reason why developers shouldn’t be writing the Chef scripts. So in terms of tools I think developers should know them all just like Ops needs to understand what the developer is doing and have some of understanding of the software architecture, I think the developers need to understand what’s going to happen in the production environment and how the production environment is going to be managed and this ties into a little bit of a mantra for me at this point which is that Dev and Ops are inseparable.
So as a developer, traditionally, I thought about creating functionality that would be delivered to a customer that they would install and manage but with software as service, functionality that I’m delivering is the service and the operation of that is just as important as the actual feature side. So as a developer, if I put my good agile developer hat on and I start thinking about the customer and what it is that the customer needs, things like scalability and resiliency and not having stupid bugs because of configuration problems, those are things I need to think about. So the idea of Vagrant or Chef or Puppet or Log-Stash or whatever are foreign to me as a developer, makes less and less sense.
[Richard’s full question: Let’s talk about DevOps a little bit and the stages. You’ve got a company who… everything is manual and inefficient and conversely you’ve got some sort of peak what you’ve really optimized. What are some of those stages, what’s that first step for that extremely manual organization who just relies on somebody ‘s tribal knowledge to build the boxes and things like that, what are those steps that you see customers follow from zero to whatever that pinnacle is?]
What I see is a really helpful progression; it’s kind of similar to think to what happening in Cloud. In a lot of cases you find developers adopting Cloud and starting to sort of push it on Ops. Well we tipped this app on Amazon and it runs and we can scale it and now it’s ready for Prod and what are you going to do? Similar I see things happening in so called lower environments. Often cases development owns environments up to a certain point, there’s kind of a blood brain barrier and so development, especially if the developers aren’t afraid of these automation tools, certainly nobody can stop them from putting Vagrant on their machines and putting Chef or Puppet on their machines and turning around and going to IT and saying “we have this incredible strong discipline now for managing our environments and we can guarantee that all our environments, you know, we can tip up a performance test environment on Amazon and we can guarantee that it matches all of our other tests environments and all of our Dev environments, why can’t you do that? And we wrote the scripts for you” and actually start to push agility with quality form Dev into IT and then the other thing that can happen is they can start saying things like ‘well, before our code ever gets to you, we’ve run these suites of security tests”. Twitter is an interesting example; they’ve got slammed by the government because Barack Obama’s account got hacked. So the government was kind of all over them about security so they instituted these really strong security practices and they will do things like, when coders checked in and even before coders checked in, they will run a whole suite of static security tests. So from a DevOps perspective, without even going to the “security department” you’ve taken something that’s traditionally outside of development proceed and pull it way up the food chain which is really where you want to pull as many things as you can?
Absolutely. Yes, and it ties in with Freeze less Production. So if you think about the things that slow down the process of releasing the Prod, it’s things like “well we need to do security analysis, both human and automated and we need to vet the schema updates scripts, both human and automated”, So to the extent that you can make them part of the upstream process, once you’ve hit the Jenkins server, once you’ve come out the other side of the tests, once you’ve finished spinning up you’re security and performance test environment in the Cloud and running all of those tests, you have a much higher level of confidence that you’re code is good from a non functional perspective and therefore there’s much more friction getting it from that point quickly or is some cases immediately in the production. So, yes, when you talk about continuous whether it is continuous integration or continuous delivery or what have you, a lot of it has to do with pulling the things that gate production release forward in a life cycle.
8. That’s a great distinction. So as you talked about Cloud just a little bit, talk to me about the Cloud journey, who offers the full journey of Cloud and what would you mean by that, how would you describe that?
What I mean by that is actually no one offers the Cloud journey. The journey is something that the customer takes; if you look at the world of service design, customers interact with your service through a kind of journey. So I want a dry cleaner; the first thing I have to do is find one; so I use Yellow pages or Google or Yelp or what have you; I find the dry cleaner, I find my way to the door, I walk in, I understand the prices, I give them my stuff, I come back few days later, I get my stuff; there’s a journey that happens through time and through multiple touch points. The same thing happens with software as a service. So I want to use enstratus or RightScale or AWS or pick any of a thousand. First thing I have to do is understand what is it, how does it work, why do I want to use it or what happens when I use it.
Second thing I have to do is actually onboard and it’s amazing how many software services don’t really think through onboarding. It’s amazing how many software services don’t think through offboarding. When you go to a department store, they, or when you go to a hotel in Las Vegas, they make it as hard as possible for you to get out the door and a lot of software services take the same approach, but the truth is if I want to leave, there’s a reason, either I’m unhappy, in which case if you make it hard for me to leave, I’m going to be more unhappy or maybe I just don’t need your stuff in which case you have this last moment to make me happier or unhappier. So I think that Cloud service providers like any other service provider in the physical world, need to think about things from the perspective of how is the customer interacting with me throughout time.
I mean, where we are now at this conference, this is part of the customer journey for AWS and people talk about how Amazon has no support and they try to automate everything. Well they don’t automate Werner Vogels and they don’t automate Jeff Barr and they don’t automate Andy Jassy and they don’t automate their conference. They very intentionally bring a bunch of people together in physical space. Well that’s part of the service they provide. So I think it’s just a matter not that you provide a journey but that you need to understand how the customer is interacting with what you do in all possible ways.
9. How does that apply then to internal IT, when you’ve got IT departments who are typically used to telling the business clients how they use services, how you engage us and you’re moving almost to a model of IT maybe, selling better to the business ? […]
[Richard’s full question: How does that apply then to internal IT, when you’ve got IT departments who are typically used to telling the business clients how they use services, how you engage us and you’re moving almost to a model of IT maybe, selling better to the business, because now I have other choices, I’ve got other things, how does the Cloud journey or the IT journey take that shift in this newer consumerization of IT or other things where IT doesn’t have the same stranglehold on technology they did before?]
Well it’s a great question, I think you’re absolutely right and it’s a really critical one that consumerization of IT basically says that IT is no longer a scarce commodity and it’s no longer hard or as hard as it was from a users perspective and they also have more sophistication so I think this whole idea of “users” needs to start going away. That employees are customers of IT and IT is a service provider. It’s kind of weird when people are all of a sudden talking about IT as a service, IT always has been a service “please run my punch cards, please run the reports over night and tell us how much money we have in the bank”. But it’s just now that IT is starting to figure out that they need to act that way. And there is Service Provider and there are users or customers and yes, there is a similar journey, I’m a new employee and the first thing I need to do is figure out how to use the stuff.
It is getting better and better at building service oriented applications and it’s interesting if you go into an enterprise with a complex suite of internal applications, you encounter a lot of what I call “cognitive swivel” which is: I use one app to do one subset of my job and then another app… and maybe the interfaces aren’t consistent, they do or don’t integrate with each other, I’m trying to get my job done; I have a job to be done, it isn’t any one of those apps. I think a place where IT can really add value going forward, as it has less control over what people are using is to really think about it from the overall journey perspective and help people have a consistent and efficient and highly productive experience.
10. When you think of a customer value of Cloud computing, does the marketing team care that their servers are now sitting in Amazon versus their own data center and things like that, so how does IT need to get better, maybe showing us the service catalogs of offering, not caring how I’m delivering the service, do you see that IT is selling Cloud to business or are they selling services to business?
Well in some cases I see business is selling Cloud IT or maybe saying “Guess what? We bought some”. The Partner Summit yesterday according to someone from Gartner, thirty five percent of IT budgets is now spent outside of IT. If I were an IT, I would be terrified because my budget just got cut thirty five percent and if the CEFO ever figures that out, he’s going to come to me and say “I just cut your budget thirty five percent, what are you going to do?”. So I think that, to your question “Why does anybody care about Cloud?”, as I said, you’re in the service business and operation is part of what you deliver. If you look at the very early software and service and marketing pitches, they talk about things like “We run it, so you don’t have to, we make sure it’s backed up and secure…. and we apply all the patches so you don’t have to”, that’s marketing talking about Ops, so Ops is part of the service and being able to say “Ok, we have high availability, high scalability and low upfront cost and we can deliver more and give you more agility for less money”, I think that’s something that everybody cares about.
11. As we think about developers who are figuring out what services they’re using even in this Cloud environment, […], do you see that as a logical transition away from even further up the stack, what does that look like?
[Richard’s full question: As we think about developers who are figuring out what services they’re using even in this Cloud environment, you mentioned in a recent blog post that even as concept of CPU hour, seems some of the things maybe we are used to now dealing with an Amazon Cloud and so, but that’s going to potentially go away, maybe I’ll just buying the service I’m actually using, the queue, the row of Data, things like that, do you see that as a logical transition away from even further up the stack, what does that look like?]
I see that as Amazon strategy, I don’t necessarily see it as anybody else is and I think frankly that’s part of why AWS is accelerating away from everybody else. If I work for Informatica, I wouldn’t be real happy this morning after the announcement of Redshift, because Amazon just said “You can consume the functionality of BI without paying any attention and actually managing the underline guts of it”. Reed Hastings from Netflix made a comment this morning during the keynote where he said “Well we’re still at the assembly language phase of Cloud computing” and I actually partly disagree with him because if you look at the higher order services that AWS exposes, they’re very much higher level then Amazon and if you compare it to programming tools, they’re very much in the library arena, the open source component arena of ”Here’s the thing that you can pull off the shelf and you can use” and you don’t have to understand or manage the innards of it, and you don’t have to deal with the infrastructure for it, so there’s a whole bunch of stuff where Amazon is letting you consume queues and records and databases and dimensional tables and data warehouses and stuff like that.
So from my perspective is kind of a natural evolution of how developers build applications that’s happened in the past in writing package software, it makes perfect sense to me that it’s happening now in the Cloud. Most of what I see happening is that the platform is a service level, which to me is very different, it’s really a container where you can put your app and you don’t have to worry about things like scaling or failover or so on and so forth for your app but it doesn’t relive you of any of this development effort, it relives you of a lot of Ops effort. The interesting thing about these building blocks services is they relive your operational effort; they also relive you of development effort because you don’t have to write your own workflow manager.
12. […] These don’t have direct portability mapping out to something else when I decide to maybe be a happy customer but move elsewhere and obviously that’s a conscience choice, is that negatively impact acceleration? It doesn’t seem to.
[Richard’s full question: That’s a good point. Workflow is a case for one if we dig into Amazon for just a few more moments, even with Redshift, I don’t know if we’re hearing a portability story, you just made it part of the Cloud journey as the offboarding and except for EC2 or RDS, S3 down to Workflow, down to messaging SQS, SNS, these don’t have direct portability mapping out to something else when I decide to maybe be a happy customer but move elsewhere and obviously that’s a conscience choice, is that negatively impact acceleration? It doesn’t seem to.]
At this point I don’t think it’s negatively impact acceleration, I think it is problematic and I think it’s a problem because there’s a bunch of other stuff going on at the basic compute level, there isn’t a bunch of other stuff going on at the higher levels it seems. If I were at Informatica I would be saying “Why didn’t we do Redshift?” I actually posted a tweet where I said I thought that Oracle’s approach should be abandoned the infrastructure as a service market and make their platform run on every Cloud in the world. Why is Amazon the one who’s in the development building block world? There are plenty of companies out there Oracle, Microsoft, Adobe and so on and so forth who have been in that business. Why aren’t they doing it? So from a certain perspective, it seems kind of strange that it’s left to the Amazons and the Googles and the Red Hats to do it and I think the software companies are in danger of getting left behind.
13. Yes, that’s a fascinating angle. Why do you think that is? Is Amazon filling a gap or they’re trying just to make sure they put a stake in the ground so they are the “run Oracle as RDS”, don’t worry about Oracle running on twelve different clouds, we already have Oracle and we’ve done it in our container.
I think Amazon understands her customers really well. I have found cases where they seem to be able to read my mind. I was thinking I really need to figure out how to have a memcache failover for my Cloud app and literary three weeks later they announced a cluster cache. I think they understand who their customers are with their application people, ultimately not infrastructure people and they understand what those people want. It’s kind of the way Microsoft did in the 90’s. Microsoft crushed Apple in the 90’s to a large degree, because they gave their developers great tools and I think Amazon is giving application developers great tools.
Richard: Excellent! Good, so I really appreciate that you’ve taken some time to share some of the great perspectives, there’s some great angles on maybe some existing things people talk about but a different way to think of things, so thanks for taking your time with that.
Well, my pleasure and good questions. Nice talking to you!
Richard: Thank you!