Bio Stuart Charlton is the Chief Software Architect for Elastra, a provider of Cloud Computing software infrastructure. Stuart specializes in the areas of systems architecture, RESTful web architecture, data warehousing, and is an avid student of lean & agile approaches to business processes and product development.
QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community.QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.
Traditional hosting providers usually give you... Either it's a multi-tenant machine where you can run software that they have set up that is preinstalled and allows you to do various web applications and databases and whatnot, or they'll allow you to have a dedicated machine in your hosting environment and you usually pay a monthly fee and get a certain service level agreement and whatnot.
The Cloud, on the other hand, has a big difference in terms of the amount of time it takes to actually forecast demand. First of all, the lead time to actually get a machine up and running shrinks to a matter of seconds or minutes because you are using a self-service interface, or an API to actually provision these machines. The second thing is that if you have a lot of variability in demand, you can use the Cloud to be able to, say, depending on the quota you've set up with your Cloud provider , you can move from 30 servers to 3,000 servers in a matter of a week, if you'd like, whereas in the traditional hosting model that tends to not fly, because you have to give them quite a bit of notice to say "I might need 3,000 servers, hey" and they'll go "Give us 2 months".
The big difference is that the Cloud provider tends to do their own capacity analysis and demand trending, obviously within some bands of quotas. That's a big deal - the ability to spin things up and shut things down and get charge purely on a by-the-drink basis. I think there is a lot of value in that. Having said that, you'll actually find in practice that, if you look at it on a pure cost-to-cost basis, a lot of the clouds - at least today - tend to be a little bit more expensive, if you are going to run a machine dedicated over the course of a month versus running it on a hosting provider. Now this is the outsourcing cloud. I mean a cloud isn't just about outsourcing, it's something that is really about whether it's something you own in your own data center and you do charge-backs. It's something that you can do that will really shrink your time to, say, I want to deploy something, whether it's on my data center or it's out on a third party like a shared host or something like Amazon EC2. I liken it to... it's that quickness - it's what differentiates the cloud from everything else.
There is a wide variety of definitions to what a cloud is and it's almost invariably... All these technical fads tend to have different... when marketing gets ahold of them and you have a lot of different definitions floating out there. And I think that there tends to be 2 sort of extremes on the cloud. One is the platform cloud where someone is creating a complete, total system for you to use - the black box of intrigue, as I sometimes joke. It's sort of, you insert your code, it's very developer-focused or very end-user-focused, such as what you've seen with SalesForce or Google App Engine and whatnot.
They fit very well for point applications or siloed applications, where if you are doing a CRM or billing or trouble ticketing or something like that and you need to do some integration with your back end, but it's sort of a self-contained thing and that makes a lot of sense to have an exchange of those applications for download, or to provide a development platform to work on. But you're very much reliant on the capabilities that are exposed, and you are also quite locked in to the APIs and the approach that they expose to you.
A lot of them have their own proprietary version of SQL that isn't quite SQL actually, it only supports a small sliver of the features. It's the "all your base are belong to us" sort of version of the Cloud, but it's going to be very useful for a lot of these point applications, but I'm not sure how great it will be for the Enterprise when they have a lot of things that exist today that they are not going to rewrite.
That brings us to the other extreme of the Cloud, and that is the infrastructure-oriented cloud or the infrastructure-as-a-service. I'm not sure how you pronounce that, IAAS or something, anyways... There you have... the poster child is Amazon with the Elastic Compute Cloud and the idea there is, this is very close to the system administrator or the operator of a data center, in that what you are doing is automating the provisioning of storage network capacity and compute capacity, usually with some predefined bundles that exist. You can do that and what's great about it is that it effectively allows you to use what you use today, so you can take a lot of the work you've done to date with existing architectures, software and whatnot.
It's very conducive to a supplier ecosystem - there still is a software industry, not everything is... Software isn't just as-a-service, there actually still is software that you can distribute it and put there and install on a cloud. There still is this ecosystem of architects trying to find the best way to do things. Having said that, when you are working with an infrastructure cloud, you don't quite get the same level of scalability out of the box as you would with a platform cloud because you have to design your system to be able to handle the scalability that you need.
It's a bit of a fallacy that, "well the cloud means infinite scalability", it does if you've designed your system to enable that. And for this reason you actually see a lot of analysts claim that the platform cloud is going to win because the black box allows me to not care about this and magically it shall scale. To that I say "well sure, except I have tremendous investment in legacy, designed a certain way", so this comes back to the real benefit for enterprises is this really reduced lead time and really reduced charges because even compared to a lot of on-premise data centers, the economies of scale of something like Amazon is quite compelling, they are masters of the low margin business and they are going after the space in a big way.
Well the biggest thing about cloud computing is... There's an old saying, "after you fix problem #1, problem #2 gets a promotion" and now that it's a lot easier to actually provision these back-end data center resources, we're going to have a lot more of them. What used to be 100 servers might now be 1,000 servers. There is a talk for example, that we're going to reduce utilization rates and this will all be consolidation, but... And I agree, virtualization and some of the technologies that help enable the cloud will get there, but if you give someone the power to do something, they are going to take advantage of it or abuse it, and so we are going to have a whole lot more stuff to manage.
A lot of the classic challenges with "IT service management", as the term is, or configuration management, and just dealing with demand management, which is a whole different world that came out of the operations industry - the folks that built ITIL, which is the IT Infrastructure Library, out of the Office for Government Commerce in the United Kingdom. It's sort of this bureaucratic but well thought-through approach to managing IT.
This stuff gets promoted into great importance when you move to a cloud type approach in that, because we have so much more to manage, what used to be just kind of done in either a haphazard way, or it was done in an ad-hoc manner or it was very bureaucratic depending on your organization. But I think it becomes everybody's concern because of this challenge and thus a lot of what you've done with software in the past, we've had very... point solutions for managing complexity. So if you are a developer, you often will, for example your build system will have a combination of dependency managers like Maven or Capistrano in the Ruby community or you have Ant or whatnot - these are fine in their communities, but they don't deal with the totality of "Well, I have packaged software and that has settings and some of them might be in a config file, some of them might be in SNMP, some might be in JMX". There is a whole bunch of standards for setting this up and there are all these little point solutions. One of the things that we're working on in my organization is trying to come up with a uniform way of describing that with hypermedia so that we have this ability to allow your software to be described in a way that handles all this mess of stuff and is independent of the data center or cloud you want to provision, again whether it's your own or it's something that's off-premise, a public cloud.
I think that there is a secret sauce theory - the Google secret sauce theory - that one of the big things the cloud does is it eliminates IT and that it reduces the need because some magical architecture is going to make it more manageable. I actually think the exact opposite is going to occur - I think that we are just going to have to wrap our heads around all the things that we've been thinking about in IT and just been getting away with, it's going to become a huge deal and software is going to change for it.
So this is a great question, and right now the answer is hard to come up with because right now you effectively can't, to some degree. Every cloud has its own proprietary elements. As I discussed before, the platform cloud tends to be to a various degree proprietary, though some of them do have some level of standards-based infrastructure as a part of it. If you look at Google, they are providing you a Python engine, you can use some standard Python frameworks like Django and whatnot to do your web applications and that's pretty cool, but their underlying database layer, which is sort of an exposure of their BigTable technology, is done with their own SQL language that's very limited in what you can do with it, so you are not going to be able to really port your application to an on-premise if you ever wanted to, for example. There are some limitations to that model, though I am sure they are going to improve that as times goes on. With the infrastructure cloud, the actual infrastructure is just as portable as a Linux machine or an OpenSolaris or Windows or whatever machine you launched.
There is a lot less lock-in in that sense, but the APIs to actually launch the machines tend to be proprietary today because there are no standards in this space, and for good reason - I mean it is a fairly new subindustry, if you will, and there needs to be time for small companies to innovate and come up with different ways of how they manage the cloud. I think that, right now, the approach really is to ensure you separate the way you build your software from the way you provision things so you can at least keep some modularity in terms of how you're locked in: is it the provisioning piece, is the management piece, is it the actual software itself? Another thing I get asked sometimes by some of our customers or prospects is, "Even on these infrastructure clouds, how do I have to change my application?" To that I say, the real thing is not so much that they make you change your application, but they often have limitations because they are dealing with a different set of assumptions from a traditional data center.
There is a great example of recovery - if I was going to... if I have a machine that's fault-tolerant or highly available or clustered, normally failover happens at an application-level layer, it happens with a load balancer or it happens with a smart proxy or something like that. That will handle the "if this machine goes down I'm going to go to this machine", but then you get into the question of recovery, of "OK, this machine I have to actually now bring up again and have it rejoin the cluster". Now in a traditional data center, what usually happens is that the physical machine is replaced. So you go to the place in the data center that has that machine, you yank it out, you put something new in or a new hard disk or whatever and it has the same IP address and the same route it always used to have. On the cloud, because we have software doing this now, the new machine may actually be on the other end of the data center.
If I am allocating an IP to this - like in the case of Amazon they have this Elastic IP - this may actually take a little bit longer, because they have to reroute the network to be able to deal with, "here is my IP" and I have to actually go to this area of the data center because really the internal IP is different, and it's because it's in a different section, different switch, the whole deal.
And how would you do it any other way? There's a lot of nuances about where we're dealing with a multi-tenant environment where you can't really predict things as well as you could... The benefit again is, I'm getting a much quicker lead time to recovery. So my mean time to recovery, in some senses, is much quicker than a manual recovery where I am replacing the machine, not as quicker if I have a hot standby with a proprietary blade array, so it's kind of somewhere in between. But that's one area where the cloud locks you in a little bit, because it's a different mind set.
Another one is when they prevent you from using certain features you might be used to using in the network - a classic case is, currently on Amazon, you can't use UDP broadcast or IP multicast. And this causes issues if you are using, for example, an application server that takes advantage of session replication. You can use session replication with an application server if you turn it into a point-to-point setup - TCP, that's fine, but if you rely on IP multicast for that you are going to run into problems because it's not supported. I think a lot of it isn't so much re-architecture as awareness of the limitations and having work-arounds for it, so it's workable, it's just... it's not a rewrite per se.
I actually the opposite will occur more often, and I'll explain what I mean by that. First of all, there are several use cases for the cloud I think that will happen in practice. There is this industry hype of "everything will move to this outsourced cloud" and my CEO and I often joke to our customers that there will always going to be some software that will be on a DVD under the CIO's pillow at night because it makes him feel good. You are never going to necessarily have everything sitting on a cloud somewhere that is outsourced, but again I strongly believe that cloud isn't just necessarily about outsourcing.
Let's talk about that outsourcing idea of "I have something on premise" versus "out on an Internet cloud". If you think about how enterprises tend to operate, the sort of cases where they are very conservative. The cases they'll go is... Test development I think makes a lot of sense in the cloud because you are in an iterative process where you don't want things to be in your way, you don't want to have infrastructure considerations slow you down and there is not a lot of quality or SLA considerations there that matter to you so I think the cloud is a fantastic case for that, particularly in cases like when you are doing stress testing or load testing.
It's fantastic to be able to truly do a representative stress test with 100 servers for few hundred dollars as opposed to renting for hundreds of thousands of dollars or whatnot. So that, I think, is a very common case that will occur. Pre-sales in particular, I think a lot of what we've done with proof-of-concepts with enterprise software have taken a long time - multi weeks - and there is a lot of issues with... not with actually conducting the POC, it's just the logistics of getting all the environment up. Well I think the cloud actually helps pre-sales of software quite a lot in getting proof-of-concepts up and running.
There is a term out there called cloudbursting, which is the idea of "I want to span my capacity for my data center out into a public cloud to deal with peak seasons or flash crowds", such as Christmas if you are a retailer or whatnot. And I actually agree, I think that will be a case where the elasticity of the cloud really plays well, the elasticity of the demand, we'll grow or shrink that - I think that will play very well.But if you go back to my initial examples of test development, stress testing, etc, what will wind up happening is you start on the cloud because it's easy and because it's flexible, but then when you go into production I actually think you'll take your application and want to bring it in-house for a lot of cases, onto either your own data center or your own private cloud running a virtualization infrastructure like a VMWare or a Xen or HyperV.
This is where the lock-in question becomes really important is that "OK, I have to make sure that I can run something that sits out on a cloud somewhere, but I can also run it in-house", because of privacy regulations, Sarbanes-Oxley regulations, or I just feel better because of that and my cost of ownership and my operating costs aren't that out of line with what I would run out on the cloud. Now over time that might change and it just may become way too compelling to not run things on an outsourced cloud, but at least for the next 5 years, I actually think that cloud is very practical for enterprises, but it's going to be this very odd, "we start in the cloud and move it back in" as opposed to moving it out, except in some cases where you are doing cloudbursting.
That's a good question. I think that one of the big changes that happens with the cloud is that it is an operating expense. What used to be a capital expense is now operating and thus requires a lot less upfront allocation to it. There is a downside to that in that a lot of large organizations actually prefer capital investment because they can amortize the costs over time, whereas an operating expenses is an expense.
Small companies on the other hand, particularly ones that don't have a lot of capital, will definitely take advantage of the ability for this quick startup time and the operating expenses are a pay-by-the-drink model, if you will, because it's conducive to the style of their operations. I think what's going to be interesting is to watch how the economic conditions will affect the prices of the various clouds. I also think it will be interesting to watch how the cloud industry consolidates, because my sense of what's going to happen is there are few large players - Amazon, Google, Microsoft, a few others probably coming out there - IBM's got BlueCloud, they're doing quite a lot there. I think those will thrive.
There's a lot of mid-tier players that I have a lot of respect for, but I don't necessarily think that they are going to survive the conditions, because this is the sort of low-margin business where... it's like retail, I mean there's only so many large retail stores that succeed and they all have their niche. So I think there is going to be a reckoning in the cloud industry and what will wind up happening is we'll have, just like Internet service providers actually, if you think in the 90s there were lots of mom-and-pop Internet service providers -- and there still are to some degree, the ones that are successful -- but by and large they all got bought up and now we all get our Internet through large telecommunication or cable media companies.
I think the same thing is going to happen with the cloud. I don't think there will be 1 or 2, I think there will be more like 5, maybe less. And then there will be a few niches, I think the way for a smaller cloud provider to thrive is to find that niche or focus on a particular vertical or Platform-as-a-Service style cloud as opposed to an infrastructure cloud where it's just purely low margin and then they might have some good success. So this is not just going to impact customers, it's going to impact the providers of this fairly nascent industry.
One of the benefactors is the software vendors that are starting to create the infrastructure, the middleware of the cloud, to be able make this a lot quicker to provision and to manage. I think that's the stuff that will really start to thrive because it's going to help to burst the lock-in, it's going to help to create some uniformity across them and of course it's just going to take some time for that to coagulate into a standard as the market tries to figure out what solutions it prefers.
I think the tools question is an evolving one, but there is a lot to be said about the various strategies out there. On one hand you have Google taking advantage of the Python utilities and the frameworks and that community and that's great. There are some startups, like EngineYard for example, which is an Amazon-invested company, as Elastra is. EngineYard is doing some really interesting work with Rails and they're building a portable cloud environment, if I recall, with a combination of Ruby and Erlang to be able to create this sort of experience that allows Ruby developers to really create this sort of scalable recoverable cloud. Again, targeted at a platform level, very developer-focused, and for good reason.
I think that a lot of the early adopters will be developers here. One of the more interesting approaches I've seen lately is what Microsoft is doing with Azure. Microsoft, of course, they have a lot of resources and so when they look at Google and Salesforce.com and they look at all these other cloud vendors out there, they are going "Well we have this platform that is very developer-focused, and Microsoft has always been a very developer-focused company".
And then you have Amazon with a very popular infrastructure service and Microsoft goes "OK, let's do both", and so they are effectively trying to cover the entire market and compete with everybody, but the angle they are taking with the developer tools is .NET, in the sense that, for a .NET developer, what the cloud really does is it makes it a lot easier to deploy and provision .NET applications.
One of the things that... A longstanding criticism of Microsoft infrastructure such as BizTalk. or SQL Server and all these variety of servers that they have, is it can be somewhat difficult to install and to operate because of the various combinations of installations that you have to do in the right order, the patches and whatnot. When you look at what they are starting to try to do with Azure and their .NET services is provide a much richer developer experience on the cloud that it's not only something to preserve their .NET developer base, but it's also to make the cloud sort of the obvious choice of where you would start your development regardless of where you would deploy it in the end.
So I think they are taking a very interesting angle that fits with their world view. And of course, Amazon allows you to run pretty much any piece of infrastructure whether it's Java, Ruby, Python or even .NET now, because you can run Windows on EC2, so they are very focused purely on the administrative side of these machines and the storage and they are not really focused on the development tools, but this on the other hand creates an ecosystem where you have the classic software industry - all of the things that are available there now are available on their environment, so it's much more of an open world approach. I think the jury's still out as to which one will play best with developers, and not only developers but a lot of the decision-makers influencers in enterprise or enterprise architects that might prefer one or the other, administrators/operations folks that they may prefer a platform model where it lightens their load and shy away from the infrastructure model because "wow, I have a lot more to deal with", or it might be the opposite: they may go "I want the control, I want to be able to deal with, here are my virtual instances and my storage" and not really trust the black box of intrigue. I think the jury's still out and it will be a very interesting ride.