Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles An interview with Sam Guckenheimer on Microsoft's Journey to Cloud Cadence

An interview with Sam Guckenheimer on Microsoft's Journey to Cloud Cadence

Sam Guckenheimer gave the opening keynote talk at the recent Agile 2014 conference in which he described Microsoft Developer Division’s ten-year journey from a waterfallian box product delivery cycle of four years to Agile practices enabling a hybrid SaaS and on-premises business, with a single code base, triweekly delivery of new features in the service, and quarterly delivery for on-premises customers.

After the talk InfoQ spoke to him about what’s needed when organizations adopt a DevOps mindset.

InfoQ: Sam, thanks for taking the time to talk to InfoQ today. Your talk emphasized the importance of changing the mindset and adapting the organization when implementing new ways of working. Agile is a part of the way we work today, so what’s still missing?

Sam:  They talked about a decade of Agile and they sort of missed the fundamental change that had happened which is that the practices they had talked about in 2001 are now mainstream and the big change, the big news is that – the new realization is that the old view that there were two life cycles, one for development, one for operations has been replaced by the realization that there is one life cycle for both and that has been forced to some extent by the improvement in development which leads to faster delivery of working software to deploy, in part by the public cloud that removes the impediments to deployment, in part by the visibility of the consumer facing web, in part by mobile, in part by the “build – measure - learn” affinity practices which connect business learning to tactical learning and in part by the broader shift to the realization that we can now have hypothesis-driven development instead of a notion of requirements or user-story (or whatever you like) product owner driven development. The idea that you can actually use data to drive what to do next.

Let me reflect on both measures and learning for a minute. Steve Blanks taught for a long time that the big issue for Start Ups is not “Can they scale their infrastructure?”, but “Do they have a viable business proposition?” and then Eric Reese came along and he sort of married that with Agile practices and “build-measure- learn” and guys like Alberto Savoya with his pretotyping enhanced that; so there is this notion of customer development instead of product development and the practices around AB testing and hypothesis-driven development and “don’t scale too early” and what have you, which are, of course, all true for start-ups, combined with the realization for enterprises that they need to adapt to a mobile and cloud world. They can’t abandon their customers but they need to match the customers where they and respond to the disruptors like Amazon and fit into this very rapid cycle.

InfoQ: Is is a huge mental shift from those large corporates?

Sam: Yes and no. I think that it is a mental shift. I think we are just about at the early majority tornado point. So, in other words, if you look at, for example, the ForreSights data from Forrester . They do a quantitative developer survey called ForreSight which is based on a curated sample of 2,000 developers worldwide from 7 countries which is in my experience the best quantitate survey. That indicates that the practices like Config-as-Code and monitoring AB testing are catching on to the point where we are about to cross from early adopter to early majority. I anecdotally observe that when I brief customers (when they come to Microsoft they ask for these executive briefings) where I used to have to put DevOps on the agenda and now they come asking about DevOps. So I think it is at that tipping point. I do not have hard evidence and you never do except in retrospect. But it is the way maybe Agile was in 2005 and it is great because it means that they are now connecting the dots on their business processes that deliver value to customers and they are differentiating core and context and they are thinking through what really does involve differentiation and the flow of value to customers and they are saying “Well, there is a bunch of stuff that does not. We do not need to necessarily worry about infrastructure. That is not something we have any differentiated capability on. We can think of that as a public utility. But we had better have something differentiated in the service experience we provide or we do not have a really good reason to be” and that is new. In the past when we had these discussions we had them about software but we did not have them including the delivery to customers. Today the conversation is about how we get things to the customers quickly.

InfoQ: What is causing resistance? I look at many of the organizations that we deal with (medium to large corporates in the Southern hemisphere) and there is reluctance and resistance to adopt DevOps?

Sam: Well, there are three sources of resistance. There is, in some cases, regulatory compliance issues. So there are places that have data provenance kinds of laws, be they sensible or not. There is the fear of job loss and in the Southern hemisphere in particular there is the fear of data latency. So, we do not have our data center opened in Australia yet. It is going to come. That will make a difference. The cable or fiber optic to and from Australia/New Zealand does add up to half a second of latency. So that is a legitimate complaint.

But, as the hyper-scale cloud vendors like us do go literally worldwide on datacenter location, we become worldwide utilities. I mean, I remember for example – was it 4 years ago that tsunami in Japan? In 2009? – I remember the tsunami hit Japan on a Friday, I think. There were three fiber links to Hong Shu at that time and because of the aftershocks they kept going up and down and no one could get reliable information about Fukushima. The center of the tsunami was in Sendai and our data center was between Tokyo and Sendai, as was Fukushima. So we made the decision on the weekend to evacuate our data center and run it “lights out” because Fukushima was too dangerous. So, by Monday we were holding hourly Scrums between Redmond and Hyderabad, which were the two NOCs (network operations centers), basically 12 hours apart in time zone. We were moving all of the network services off of the Japanese data center remotely, including Hotmail which was considered quite critical by the Japanese government because they were telling everyone to stay indoors and use email and we have 10 million Japanese subscribers. That was the first point for me where I really truly understood that this need to run like a utility, or better than a utility given the way Fukushima was unfolding.

So, I think that where we will be in a matter of years is there will be a few hyper-scale cloud providers, we will be able to give global capacity and global capacity will mean scale wherever it is needed, low latency wherever it is needed and for everyone who is still building private data centers there will need to be very good reasons to do it differently. It will be like: there was a time when JP Morgan needed to build his own power plant to evangelize his investment in Edison’s light bulb, but then public power came along and it succeeded.

InfoQ: So, in order to move to that sort of global scale and that infrastructure as a utility, what are some of the things that organizations are going to have to think about and do? Looking at those places that are running all sorts of services and systems and they have got this very traditional deployment process. We are talking about a massive disruption.

Sam: Yes. We are talking about disruption. I mean a lot of the current dystopia is described well in “The Phoenix Project”, Gene Kim’s book . We need to move to standard stacks and practices such as config-as-code. For example, in our world, what is called DSC- desired state configuration – is supported by PowerShell and let’s say that you have an automated release pipeline, you need automated release management on top of it and you need very good monitoring in production both at infrastructure and at the application. Now, if you are using the public Cloud everyone has a good infrastructure and then you need to put it in your application; that is what our Application Insights is about. When you use public Cloud, a lot of the traditional concerns of IT sort of go away because they are effectively becoming concerns of your cloud vendor. We install a server every 5 seconds and so we have, for all intents and purposes, infinite capacity for an enterprise customer. So, relative to any of our customers, we can put in so much more compute power on demand than they possibly can and we can do it at a so much faster pace than they can. Now, what we cannot do is to do it in custom setups or unheard of configurations or unique typologies or what have you. We have some very specific ways of configuring Azure. There are a set of Azure IS options, there are a set of Azure configuration rules and they work, they support all the major operating systems, all the major languages, what have you, but you do not get to tweak the switches the way you do when you own the data center yourself.

InfoQ: You do not get to choose your voltage of utility provider.

Sam: Exactly. So there are some hardware vendors who need that capability. I mean literally if you are making routers, you know, or transmission equipment, or what have you, you may need to do something else but for IT it is not a problem.

InfoQ: So what would a typical large corporate need to do. What do they need to have in place to really move into the this space? What are the big changes they are going to go through? 1578

Sam: Well, I think most big corporate have gone through virtualization already. I do not think there is any news there. So, I think going from a virtualized data center to a hybrid private and public virtualized data center is not a big shift. I think that they can do it workload by workload. There are obvious workloads that are very easy to move to the cloud. For example, everything around DevTest goes to the Cloud with zero effort because there is effectively zero dependency. I think that for any new project, I think you have to ask the question “Why not start in the public Cloud?” Then I think that for existing ones, you should be asking “What is the advantage we are getting by keeping this in a private Cloud or on premises?” There may be specific advantages. I mean it may have to do with regulation, it may have to do with specific local conditions of latency or customer proximity or you may have sunk costs or whatever. But, in general, you have to look hard at whether your operating costs are competitive with public Cloud operating costs and whether your agility in the application is comparable to what you would get in a public Cloud.

InfoQ: For those developers, support people, technical teams, what are the new skills they may have to learn that are not there, that they do not have at the moment?

Sam: We have worked hard to make .NET very much the same across private and public cloud. There is not much to learn about administration. You need to learn about release pipelines and config-as-code. You need to learn about monitoring. Typically teams become much more aware of the interplay of architectural decisions and cost because cost becomes variable. In the past, capital costs were assumed as a given or static or immutable. The public Cloud exposes them as a variable. So, all of a sudden you see that an architectural decision has a cost implication that you may not have thought about before. So, in other words, how you structure your data storage may make a big difference in terms of what things cost. You may make decisions about how much latency you take on based on cost. On the flip side, you may realize you can do things like large memory data bases that in the past were not practical. Or you may do things like burst workloads that in the past were not realistic because they required specialized hardware.

InfoQ: So all of that becomes just a choice and the decision about “Are you prepared to pay extra for that capability?”

Sam: Yes. Sometimes it is paying extra or sometimes it is having things available that you wouldn’t have available. A great example is load testing: in the “old days”, load testing was rarely done, usually at the end of a product cycle because it required a lot of hardware typically and specialist skills. I believe and hope that load testing will become something people do at least in every sprint if not daily. In other words, it would be a wonderful thing if people ran an hour of peak load as part of their daily practice to check that there are no regressions in performance. If you can do that by simply getting burst capacity from the Cloud and run it in the middle of the night when machines are idle, it is not a big deal. That is not something you would have thought about in the old days.

InfoQ: Yes, because the cost was just prohibitive.

Sam:  Right.

InfoQ: What are some of the other examples like load testing?

Sam:  Another example is monitoring. Consider global monitoring. It is now very easy to do global points of presence for monitoring. In addition to real user monitoring with beacons in your application and to have very large amounts of data aggregated. The whole “big data movement” which is a term I do not like – I mean we want big information, not big data.

InfoQ: In fact we want summarized information.

Sam:  Right. We want good insights, not big data. But the fact is that you can get large amounts of data cheaply because it is possible and feasible to collect store exabytes that you never could before

InfoQ: You were saying we are starting to move into early majority. Looking forward - where are we going to be in five years’ time?

Sam:  I think in five years’ time the default question for any new project of scale will be “Why not public Cloud?” I think the “buy” versus “build” question will be different. So you will buy anything that is context as SAAS or rented and you will be very conscious of where your differentiators are that you want to build, what Forrester calls “systems of engagement” or Gartner calls “systems of innovation”. I think that there will be more reliance on those things being distinct.

InfoQ: What about things like the European Union’s data sovereignty rules? How is that going to play out? Is it just going to be a matter of providers like Microsoft and others locating the data centers there and looking after the data?

Sam:  Right now we are building or have opened two data centers in Europe. So there is a point in time anomaly where the 20-some countries of the European Union have to harmonize their data privacy rules. The Germans are more stringent but I think that is largely a point in time issue. I think there is another concern that has thrown a wrinkle in the whole mess which is the Snowden revelations. But, I think that is a total red herring. I mean, if the NSA is tapping the phone lines, they are tapping the phone lines regardless of whose data center the bits are flowing into. You know, if you are tapping the cable …

InfoQ: You are tapping the cable.

Sam:  Right.

InfoQ: It’s not a plug in the data center, it is a tap on the cable.

Sam: We are not giving the NSA a key.

InfoQ: Is there sufficient redundancy in the network? We talked about what happened with Japan.

Sam: I think there is now. I think we are getting there. I think we know now how to better survive earthquakes. So we do have data center to data center fail over. If a country goes dark, we literally can fail over. In fact that is what we did in Japan. The modern design is to use lots of redundant commodity hardware and lots of fail over techniques which is why Azure is built on triple redundancy. You can fail over a rack, you can fail over a scale unit, you can fail over a data center. I do not know of an enterprise customer that has that capability to fail over a data center. I also know that very few enterprise customers actually practice their disaster recovery techniques, important though they are.

InfoQ: So this is another overhead cost that gets moved to the Cloud.

Sam:  Right. We provided a disaster recovery service which is a great use of the Cloud. I mean, you can basically suck up your data center and it is a lot cheaper than keeping a back up.

InfoQ: Sam, thank you very much.

About the Interviewee

Sam Guckenheimer is the Product Owner of Microsoft Visual Studio product line. He has 30 years' experience as architect, developer, tester, product manager, project manager and general manager in the software industry worldwide. His first book, "Software Engineering with Visual Studio," was translated into seven languages and recognized as a de facto guide for teams adopting agile practices.



Rate this Article