Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Ines Sombra on Data Services at Engine Yard

Ines Sombra on Data Services at Engine Yard


1. Hi, this is Chris Swan, one of the cloud editors at InfoQ. I'm here with Ines Sombra and she was a speaker on my cloud track yesterday at the QCon New York so the video of that presentation is going to be available separately. So Ines, tell us about your background.

Alright. So, I'm a data engineer at a company called EngineYard. My background: from the beginning I started in the computer science in Argentina long long time ago. Then I continued the computer science and I've been doing computer science for a while. I went to school, I got a couple of masters and then now after school I decided like that I wanted to look what my next problem set was going to be and what type of things I want to sink my teeth into and clouded environments and even data stores in the cloud became really really attractive to me, so I use to like databases before, I had a really amazing professor when I was in college in Argentina that pretty much made me interested in how we deal with this type of repositories and how we deal with how we store data.

And from that moment on I have been just like doubling back and forth between development and then databases, development and databases. Now I get to do that for a living. I get to figure it out what type of datastores our customers want to put in their applications, how we support them, how well they are integrated into our product. And I get to make that experience and I get to worry about how we're making this available for our customers and I really liked that so it's like an integration of both things. I get to develop the things I will make available and how customers use it and how easy it is to use it and I think is a very interesting space right now. So I pretty enjoy it.


2. So the folk watching will not necessarily know EngineYard so tell us a little about EngineYard and then also tell us what you do there.

Alright. So EngineYard is a platform as a service (PaaS). We sit on top of cloud providers and then we offer a standardization tier with an OS that we have, AMIs that we curate, stacks that have to deal with languages. We use to be a Ruby shop and now we have support for PHP, support for Node.js and support for Java very recently. So if your application has been written in any of these languages you come to us for a curated experience. In a way we make it very easy for you to get started in the cloud and we make it very easy for you to grow. If you have something that you need to make some customizations we don't get in the way of you being able to do that. So you can, you have your resources, we manage them for you, we provide you support and we make sure that you're up to date with patches, we manage the thing about releases, we do the orchestration and those type of like do provisioning for you and these things are yours; we just like help you with the management and support of it.

So companies use us for either a standardization tier, whenever they're too big and they don't want to deal with the cloud deployments – they just come and bring us in the middle; or they also use us whenever they want a leverage, our DevOps expertise. So, the DevOp term, it's very trendy right now, some people have very strong opinions of pro and against it, I'm just going to use it because it's something well understood. I feel, even it is controversial, it’s as controversial NoSQL to me. But we help people just get started with their applications. I we make developers only have to worry about their code and not necessarily how is provisioned in the cloud. And within EngineYard I deal, we have stacks and we have languages that we support and we have databases that we support and my responsibility are the databases we support.


3. So tell us more about that. Which databases do you support and kind of what drives people's choice over which database they use.

When I started we only had MySQL. So ever since MySQL we have introduced new repositories. We have introduced Postgres, and made it our default. We also have a support for MongoDB and we have made Riak the first like NoSQL data store that has been, like a first class citizen in our stack, just like the level of Postgres and MySQL. So we get to make these decisions about what data stores our customers, we think that there are reasonable for our customers to use and the way that we get to just educate them on their choices on data stores. Their choice within MySQL and Postgres is a little bit, it’s easier to make in a way.

So either you want one of the other, I think Postgres is a little bit more versatile, and I think in the case of MySQL you have a more developed clustering story, and there's innovations on both data stores and I get to just make things, bring the innovations to happen on both data stores quickly. I don't think it's my place to make decisions for my customers, and I think they were about choice, so we don't limit you, the whichever our data stores, you may not even use our data stores with us, like also it became possible after I was there to have no datastores whatsoever with us, so you can leverage somebody else's. So if you have a service that’s already storing your data and you're happy with that you don't have to have a database on us. But we have really nice database support, our DBA are amazing, so the choices I get to make or the opinions that I have, I express on what is currently a supported database on our stack and then I keep pushing for the things that I think that are very interesting to bring into our stack in the future.

Chris: So it sounds like you're implementing and managing your own databases, rather than sitting on top of something like RDS, which is also a service available in Amazon.



4. So is one of the options for your customers to switch to RDS?

Yes. So you can have no database with us and you can use RDS, and that works well, too.


5. So in your presentation yesterday you talked about this model of pets versus cattle, that people use. It’s become almost a cliché when talking about the cloud, and you expressed a point of view that you need to look after the cattle too, so tell us more about that.

Yes. Alright. I think there is some value to this analogy because in the cloud where you have these things that can come and go and there are like less endearing to you than your pet, where you name it something special, and I would be like the point, it would be like the place in the world where you come and bring your things. With cattle the notion is like if something misbehaves you can shoot it in the head and then replace it, like it's gone. But my dissatisfaction with this analogy is that while you have cattle, the function that the server has is important to you. If you have something that is a VM and that is running your database, if that goes away, that is still not going to be pleasant for you. So while this instance, that can come and go, and can't be shot in the headat any given time, is there, you care about it.

So you have to maintain it, you have to monitor it, you have to have alerts enabled for it, so to me is not as transient, is not that you don't care and you can replace it easily. I think maybe the scale in which that is true is very different depending on the organization that you have. Some things are very easily replaceable when some others are not. And what I would argue is that the role of this particular server that you have and you refer or you think about it as something disposable, or something like cattle, or something generic, it doesn't really hold up true to me, because everything I see is you care about them, you want them to be as healthy as they can be whenever they're there, and when you have to replace them you want that process to be efficient. But is not like that la la la, replace it and everything is going to be fine. I mean there's some fine tuning to do, so to me is just doesn't hold true, the fact that you have your pet that is dear to you, or a cattle that you don't get to think about. I think that both of them have to be maintained and both of them have to be looked after.


6. So there's still some sort of animal husbandry needed in the cloud.

Yes, I think so, and then you have companies that talk about this, like the Netflixes and Google, and I think what they fail to tell you is that things that can be easily replaced are things that are likely transient. I mean if you have a load balancer node, those are things just like “yes, sure, replace it immediately”, but you still need them. So, yes, animal husbandry it's definitely, or house-keeping, whatever you want to call it, I think is useful in a sense that I think it allows people to start thinking about their application in terms of like things that can come and go, but at the same time you still have to worry about it.


7. So we had a sort of open session in the cloud track yesterday and you raised some questions around state management in the cloud. How do you see that evolving?

Yes. So I think that we have with traditional RDBMS's is like we have the old notion, like you are going to have a server, that you have physical hardware, and I mean granted things could fail, and they could fail if you have physical hardware, but I think that the rate of failure, at least at the scale of normal corporations, it wasn’t as quick, it wasn't as tangible I think as where it is with cloud. With cloud you get this notion of things coming and going and your application has to account for that. So the things that where you can come and go, and this is I guess another reason why the cattle versus pet thing doesn't really work for me is because everything that can come and go are things that are not storing important information. When you have to deal with coordination and where writes go and you have a database master, and that thing, your traffic of your application is going to one place to write, that is not necessarily a point, that is not something transient. So to me, unless you have a data store that has been built with this clustering notion where you can write to any node or you have some degree of replication through the entire system, and it has been baked into the system, if the data store that you are using understands this, then yes, it becomes much much easier, and then the state in the cloud becomes “I got no issue” for you.

But whenever you're using something like a traditional RDBMS, then that has not been built with this cloud paradigm in mind, it becomes problematic. And it was a little bit about discussion there, because one of the other speakers was mentioning that this is not necessarily an inherent problem of the cloud and I do agree. This is the problem that you have no matter what, but I think the cloud makes it more evident. It makes you deal with it. Fast. If you have a master replica somewhere and the replica is not automatically promoted, you have to deal with it. You have to rehearse it, you have to know how this promotion happens, you have to have some degree of automation, so this makes then state something like you have to think about it. You have to architect your application in a way that it will be resilient through resources coming and going, and also you have to architect your state in a way that takes this into consideration. So either you use something that has been clustered or either you put processes and automation behind things that give you different guarantees, like you have ACID properties on that relational database. If you need those in your application and you want to be in the cloud then you have to think about it, you have to architect your application with that in mind.


8. So another question that got raised was why not cloud? So you're in the business of providing people with the clouds up straight, but what do you see your customers not wanting to bring to the cloud and how do they deal with having some of their applications in the cloud, and maybe some of them not?

Okay. There's several areas where the cloud I don't think it has proven to be a more difficult fit or even just not a good fit at all. If you have regulations, like if you have to provide data ingress encryption, or if you have regulations like which country your data lives in, I don't necessarily think that the cloud is making it easier for those people to adopt it or we may nor even make sense. When you really need hardware performance, cloud gets expensive, so what we see with our customers is that they may start small or medium size and then their application explodes, and this elasticity allows them to just grow very quickly but then at some point the cloud resources they are using become too expensive.

And actual hardware is cheaper, so they may choose to actually move off the cloud to real hardware and maybe have smaller services in the cloud. So whenever the cost of IOPS or whenever you are I/O bound, then the cloud doesn't really do that much for you and he gets really really expensive. The advantage is that if you're growing up like you don't need an investment of a lot of cash in order to be able to buy this equipment, but I think that once you grow to a certain degree, is like people start looking at options are going back in the house. It has something proprietary, where you're actually competing with one of the cloud providers and you're not going to put your business in it. So I think they tend to be, one of the reasons that I see where cloud is not a good fit, tend to be with regulations of your business, like how your business stores their data, limitations on hardware. And I think those are the most common places where I don't think the cloud has worked very well.


9. Okay. So trust has been another kind of common theme of discussions over last few days and I think it had two aspects to it. One has been high-trust environments for people, and the other has been how do we establish trust in technology. How do you see that impacting your work on day to day basis?

I use to work in a more traditional environment, where this notion of trusting people was not necessarily like super...super, it was traditional old-school type of corporation, so this thing about trusting in somebody and developers being able to push code to production or something like that it was very revolutionary. Ever since I switched working at a startup it’s become common place. So you're trusted to make your own decisions and there's a lot of autonomy and also a lot of ownership of what you do and I really really enjoy that. I think that this type of being the commander of your own ship and being able to do those decisions it fits my personality type. It may not fit other personality types but I like this, the idea of blameless cultures, I like the idea of empowerment of the people that are there. I think you do more creative work when you're in this type of situation. So that is the cultural aspect of trust, in terms of people, to me. In terms of trust in infrastructure I think it’s a matter of just being open with what you have. I mean I talked about post-mortems as being something important.

Even at the organization level, the barriers between departments, you can then start instituting trust. And when you do live in the open, then you can actually learn of people's mistakes and start improving the state of the industry by just being able to be very straight forward with the things you did wrong, what are the things you did incorrectly. When it comes to infrastructure provider I think that the key is around instrumentation. So we got better at it and everything that we send to our cloud provider gets instrumented on our side. So we're able to actually have more metrics and more information than the cloud providers like sometimes have, to be able to troubleshoot their problems, so at some points we're even the people that say, “okay, you have a problem over here”. I think is easier to be trustful whenever there's shared information, there's something that is just like that you're willing to discuss about this.

I went all over logs, say that bla bla bla bla this doesn't work so what you're telling me is not correct. But when we have this data, whenever you have something that is irrefutable, like we made our request on this point, request on this point, and when we’ve told you to nuke this instance you didn't do it, we told you again you didn't do it, whenever you have these things in hand, the conversation becomes different.

So, I don't know if this is going to be idealistic but I think this openness and this willingness to be able to discuss things and then just address problems with actually information I think is key to build this type of trust. You can trust your infrastructure provider whenever you see that everything you sent him is up and running. You have your information, they have their information, everybody can discuss everything in the open and I like that. I would say that in cultures is the same: if you start sharing things or if you start making the boundaries or the discussions of your problems explicit, than you can go and read about what is a department in your company that is having an ex-problem. And there's a problem maybe with overload as well, but I don't really know where the line is. I think that more openness is better, like I would like to know how the leaders in my organization are making decisions, though I would like to know things like that just make me trust a place more. So I guess this is my very idealistic answer, like you share, you talk about things and you share them and you make them just known.

Chris: Well Ines, thanks for coming along and sharing and I'll encourage people once again to watch the video of your presentation when that becomes available and thanks for being here at QCon and stopping by.

Thanks for having me, thanks.

Sep 08, 2014