InfoQ Homepage Podcasts Anurag Goel on Cloud Native Platforms, Developer Experience, and Scaling Kubernetes

Anurag Goel on Cloud Native Platforms, Developer Experience, and Scaling Kubernetes

Feb 07, 2020

Podcast with

Anurag Goel

Daniel Bryant

In this podcast, Daniel Bryant sat down with Anurag Goel, Founder and CEO of Render. Topics covered included: the evolution of cloud platforms; simplifying developer experience; running large-scale workloads on Kubernetes; and the future of tooling and platforms within the cloud native computing space.

Key Takeaways

Render is aiming to be the next generation of cloud provider. Developers deploy and manage applications via a Platform-as-a-Service (PaaS) -like experience using custom simplified YAML configuration.
Render is built on top of Kubernetes, but the internals and configuration of this orchestration framework is not exposed to end user developers.
Many large scale usages of traditional cloud vendor platforms require the formation of specialised in-house “DevOps” teams. The provision of virtualisation and API-driven operation via the cloud providers was revolutionary, but it didn’t fundamentally change the existing platform paradigm.
Arguably platform usability may have taken a step back with the arrival of public cloud vendor platforms. For example, developers may just want to write code, and not have to write complicated deployment descriptors. Operations team may want to focus on supporting engineers and advising on performance and scale, rather maintaining cloud provisioning scripts.
The Render team are planning to run all future workloads of self-managed Kubernetes, rather than use a hosted offering, due to them experiencing implementation bugs when running their clusters at medium-to-large scale.
The Cloud Native Computing Foundation (CNCF) is encouraging large amounts of innovation within the cloud platform space. However, due to the Cambrian explosion of the Cloud Native Landscape over the past several years, there must surely be consolidation of tools, platforms, and vendors in the near future.

Subscribe on:

Show Notes

Can you introduce yourself?

01:20 My name is Anurag Goel, and I'm the founder and CEO of a start-up called Render.
01:35 We are a new cloud provider, and make it really easy for developers to deploy, install and run their applications on-line.
01:40 That includes everything from a static site to a complex set of micro-services, and we have a lot of features that a lot of cloud providers do not yet have.
01:55 Our goal is to make it easier and easier to deploy applications and run them and scale them on-line.

What is your most typical use case you see online?

02:10 It is a regular website or a set of websites that consist of a single web property.
02:20 A single website might just be a front-end; there is typically a back-end API that it's talking to.
02:30 We also have background workers and cron jobs which are running on Render.
02:35 You can deploy applications such as Wordpress, Discus or Mattermost on Render with a single click.

How does this differ from classic IaaS or PaaS, with the spectrum of deployment approach spanning from "everything defined as code" to the Heroku-inspired "git push master"?

03:00 We're definitely on the git-push side of the spectrum from the ease-of-use.
03:10 Having said that, we're much closer to the functionality of an IaaS on the feature spectrum.
03:25 On IaaS, you cannot easily run a private service that is only accessible to other services on your account.
03:30 For example, if you cannot run an ElasticSearch cluster that is not visible to anyone outside your cluster of services.
03:40 Heroku, for example, does not even allow you to run a private service.
03:45 For AWS, you have to set up a VPC, security rules, IAM policies to make that possible.
03:55 On Render, all you do is tell us that it's a private service; we spin it up and give you an internal URL that is only accessible to other services in your account.
04:10 We take care of service discovery so that you don't have to do anything other than tell us it's a private service and do a git push.

Do you mean when you say that "legacy cloud" is focused on configuration?

04:30 It's not just about the configuration; it's also the amount of work you have to do in terms of engineering cost.
04:40 Any company that's running today on one of the big cloud providers (AWS, Azure, Google) is almost inevitably building out an extremely large DevOps team.
04:50 Most of the people on the team are simply keeping the service up; they're not really doing core technical interesting work.
05:00 They're maintaining AWS scripts, and dealing with CloudFormation scripts that are low-level.
05:10 What I mean by legacy cloud: when the tech industry started moving on-premises servers to the cloud and VMs, we didn't change the paradigms that we were operating on.
05:20 We just took the servers in the data centre, and Amazon made them VMs in the cloud.
05:30 Virtualisation was the biggest advantage - and you can use virtualisation in the data centre, too.
05:40 We are not really talking about a major shift about how applications are deployed and run in the cloud, and what that means for the end user.
05:50 If you have a team of DevOps people running applications in the data centre, they just switched to running them on VMs in the cloud.
05:55 We see thing like Kubernetes where people need to know how to manage a Kubernetes cluster, how to install all kinds of Helm charts on it and so on.
06:00 The CNCF landscape has a thousand things on its diagram, and it keeps growing - it's too complex; we've taken a step back on cloud usability.
06:15 I think we're seeing the effect of that, where people don't want to use Kubernetes, but want something really simple.
06:25 Even within organisations, we're seeing whole teams dedicated to making Kubernetes easier to use for developers in those companies.
06:30 Our goal is to prevent everyone from having to repeat that work in every organisation.
06:40 What we're doing is offering the functionality you would get from Kubernetes without you having to configure it yourself or maintain it.
06:55 If you had 20 engineers on your DevOps team with AWS; if you moved to Render, you would probably only need 5 or less.

That's quite a claim in the reduction of team size! Have you got any public use cases you can share?

07:30 We had a big presidential campaign running on Render that before was running on GKE.
07:40 They were tired of the complexity and maintenance costs of running on GKE.
07:45 They didn't want to deal with it - they just wanted to run the website, keep the website up, allow their users to read the policies and donate.
07:55 We have since had them survive and thrive on Render during multiple debates.
08:10 At this point, their DevOps team doesn't exist; they were able to reduce all the engineering work that they would have to do on GKE to nothing.
08:25 Right now their engineers and DevOps people are able to focus on things like what the application needs to perform better and optimally scale.
08:35 That's what I think DevOps people want to do, but are not able to, because they end up doing thankless undifferentiated heavy lifting to keep things up.
08:45 They don't get rewarded when things are working; DevOps and IT organisations are only spoken to when things are down - it's a thankless task.
08:55 We're also trying to take that out of the picture, and make it easier and more interesting for DevOps people.

I expect that Render is aiming towards automated DevOps and reducing toil?

09:20 Yes, it's about automating as much as possible and giving you the tools to run really complicated services on top that you wouldn't otherwise be able to easily.
09:30 We've had customers tell us if Render didn't have easy private services, we wouldn't even have this app, because AWS would be too hard.

Why is now the right time to introduce another PaaS into the market?

09:50 We've matured as an industry, and the deployment and layers of abstraction have changed.
10:00 You are thinking about servers versus VM versus containers now.
10:10 The biggest change that allows Render to exist now in a way that is much more user-friendly is the rise of containerisation and Kubernetes.
10:25 We're able to utilise those ideas and the work that a lot of amazing people have done, at these companies like Google, Amazon and Microsoft.
10:35 We're able to use those technologies to give our customers a good experience and not just build technology for the sake of using it.

Do you run Render on a hosted Kubernetes service, or do you host it yourself?

10:50 Part of it is running on managed Kubernetes, but we realised that managed Kubernetes does not do what we would want to do to render on it.
11:05 We've experienced bugs when upgrading Kubernetes, and it started failing or wouldn't start new pods.
11:10 This happened during our TechCrunch presentation, and all these people were signing up for Render trying to spin up new services, and things just broke.
11:30 We spoke to our provider, and first were told that we had too many etcd objects in our store - and we're clearly not running the biggest Kubernetes cluster there is.
11:40 Then they spent a week and admitted that it was a bug in the new version of Kubernetes that was rolled out, but their implementation of it.
11:50 We have decided that we need much more control over Kubernetes masters, and over our own clusters, so all of our clusters from this point forwards are using Kubernetes hosts.
12:00 We might not always be using Kubernetes in the future; for us, it's not something we want to expose this to our users.
12:10 If we can think of better way to provide these services to our users then we might use something else.

How are you using Kubernetes?

12:25 Right now, it's mostly our own tools, so we're not using tools like "kops".
12:30 We're not doing a lot of rollout work on our clusters; once the cluster is there, we use the Kubernetes API.
12:35 We end up using YAML and Terraform now and then.
12:45 Going forward, we're going to use kube-adm more and more, if and when we're setting up new clusters.
12:55 None of this matters to our customers; it matters much more to them that they have the functionality, security and scalability for what they need.
13:05 Kubernetes is a means to achieve those objectives.

Can I lift and shift my existing Kubernetes applications directly on to Render?

13:15 You can run all the apps that you are running on Kubernetes on Render, but if you want to deploy a Kubernetes YAML with an API version, that's not what we're doing.
13:30 We aren't building a managed Kubernetes hosting service.
13:35 Kubernetes is just a tool that helps you achieve business objectives.
13:40 Our goal is to have infrastructure as code that is much higher level.
13:45 If you go to our website and search for YAML, we have our own YAML but you don't specify ingress, load balancers, cert managers - you don't have to tell us any of that.
14:05 All you have to do is tell us that you want this service from this git repo, and these are the environment variables you need.
14:15 It's at the same level of abstraction as a Docker compose, so that people can set it up really quickly.
14:20 People can set it up really easily, but can do really complex things.
14:25 During the TechCrunch demo, in which we were the winners of TechCrunch Disrupt for Startups in SF in 2019.
14:45 The demo that we had was a single-click deploy of Mattermost, which includes four or five different backend services; ElasticSearch, Redis, and two frontend/backend servers.
14:55 We were able to deploy an HA mode in a single click just using the Render YAML, which was less than a hundred lines.

Helm tries to lift the deployment abstraction above Kubernetes, but there are many other aspects of creating an application, such as building the code. Does Render have the concept of buildpacks?

15:30 We don't have buildpacks, because we think they're not the right abstraction level.
15:35 Buildpacks to me, are fairly bloated, because they try to be everything to everyone.
15:40 When you look at an actual buildpack, you'll see all sorts of conditionals in the code, all the variables which you have to configure.
15:45 When something goes wrong in a buildpack, you don't know how to fix it, and you have to open a pull request for them to update it, or fork it and fix it yourself.
15:55 So we asked ourselves; what is it that's needed when you build an app?
16:05 All we ask for in your YAML or the dashboard is a build command, which runs in a context of an environment with all the primitives that you need to build your service.
16:15 You know best how to build your app - so we don't assume you're going to build your app in a certain way; it could be a script or a single line.
16:25 We clone your repository, make sure your dependencies are available, and run that command to build your code.

How do I connect services in the Render YAML?

16:50 Render YAML also lets you connect services, just like you can in Docker Compose.
17:00 You can say that you want your service to get an environment variable from my ElasticSearch config, so you don't even have to worry about linking them yourself.
17:10 You automatically populate the right environment variables wherever you need them.
17:20 If you wanted to generate a secret, for example, you can say that you want a secret to be generated - that way, it's not in your GitHub repository.
17:30 We automatically populate that as an environment variable that you can use elsewhere.

How does service discovery work in Render?

17:35 Behind the scenes, service discovery is based on DNS, and we're using the same primitives as Kubernetes.
17:50 There's a DNS service running in our clusters, and it is responsible for responding to requests for particular URLs.
18:00 Let's say you are running a private Redis server in Render; the URL you are going to get is something like redis:6379.
18:10 Your other applications can use that URL to talk to the service directly, and we make sure the network configuration is set up so that only your services can connect to it.

Is there any concept of retries or circuit breaking at the network level?

18:40 Not right now; it is something that we've been keeping an eye on over the past few months.
18:50 The concept of service meshes is still evolving, and it makes much more sense for cases where you have a lot of different kinds of services that need to talk in a uniform way.
19:00 The thing with Render is that most of the services running on Render are new services that people are building.
19:10 It's harder to migrate an existing from AWS if you have all of the CRD groups and IAM set up.
19:20 Render is much better suited for building new services, with the Render service discovery in mind.
19:40 When it comes to circuit breaking and retries, it's on our road-map; it's not something that our customers have asked for.
19:45 They have asked us to build object storage, and a managed Redis service - even though you can use existing object storage like S3 or Redis on any cloud provider.
20:00 We are seeing that customers want to keep all of their systems in one place, and once they are using Render for one service, they want to bring all their services into Render.
20:10 They don't want to log in to the AWS console to manage S3 separately.

Can you do canary releases in Render, and if not, how do you do releases?

20:30 Our users right now get zero-downtime deploys out of the box.
20:40 They can actually define a health-check on their application, and when they deploy a new version, Render calls it to check that they are healthy.
20:50 Even if they don't define a health-check, we know which port the application is listening on, so we run a quick TCP check to find that it is working.
21:10 If our system determines that the health-check is failing, then we mark that deploy failed and nothing changes as far as your users are concerned.
21:20 If the health-check succeeds, then we start moving traffic over to the new app and make sure we spin down the previous instances gracefully.

How tricky is it for your team to offer a data store or a managed Redis in Kubernetes behind the scenes?

21:50 Not as difficult as people would have you believe.
21:55 We've had our managed Postgres service out in the open for nearly a year.
22:00 We only launched in April, but we had a private beta a while before that.
22:10 I think it might have been harder two years ago, where Kubernetes didn't have the primitives to do that.
22:20 It's also changed because people want to run things like Postgres in Kubernetes - so the next release would make it easier to do that.
22:35 It's not just Postgres; it's also MySQL and people who are creating Kubernetes-focussed deployment docs, and things allow their users to manage their databases in Kubernetes.
22:45 CockroachDB has a first class support for Kubernetes, and they help you install it.
23:00 When you hear all the bad things about running stateful workloads in Kubernetes, this is mostly just FUD.

What do you think about observability and understandability?

23:15 Things like Prometheus make it quite easy to do, and we're able to display usage of CPU and memory metrics to our users through the dashboard - it comes out of the box.
23:30 All of that is doable, because we're able to use primitives like Prometheus in the back-end.
23:35 I think this is where the timing aspect of Render makes much more sense - none of this was available five years ago.
23:40 The first time I started thinking of Render years ago, I waited for a year to start the company, because I didn't think it was possible for a new startup to build a new cloud provider given AWS.
24:05 Our users and our numbers suggest otherwise.

As Render is building on top of CNCF components, what do you think of the CNCF landscape?

24:25 I think there's going to be consolidation in that space.
24:30 I don't think that all the different startups and large companies that are attacking specific niches in the Kubernetes ecosystem are going to survive for the next ten years.
24:40 There's going to be consolidation in this space, even in the next few years you'll see a lot of these startups being acquired.
24:55 Maybe one or two of them are going to break out and become the leader in the specific vertical that they're targeting.
25:00 It's not sustainable for the ecosystem to have all of this, because ultimately people who are making decisions about which vendor to pick don't want to pay 500 different companies.
25:15 We will see consolidation, and that is where Render comes in, so you have these features out of the box without worrying about which vendor to get it from.

How can technical leaders evaluate the choices between the cloud platforms and calculate the TCO?

25:40 People often forget the cost of engineering and the time that goes into maintaining infrastructure on AWS, GCP and Azure.
25:50 It's easy to look at the cost of a VM, and see that it costs less than what it costs on Render.
26:00 For most companies, engineering costs are way higher than their infrastructure costs.
26:10 So I want to reinforce the advice of making sure that you have considered the amount of time it takes to spin up new developers and DevOps on Kubernetes.
26:25 That is what we spend a lot of time on, and I want people to consider not just the infrastructure cost but also the engineering cost of picking a cloud provider.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.