Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Podcasts The Challenges of DevOps and the Importance of Developer Experience with Jyoti Bansal

The Challenges of DevOps and the Importance of Developer Experience with Jyoti Bansal

In this podcast Shane Hastie, Lead Editor for Culture & Methods, spoke to Jyoti Bansal about the challenges of DevOps today and the importance of developer experience for effective software development today.

Key Takeaways

  • There are too many moving parts and separate pieces in the typical DevOps pipeline today
  • We’ve made relatively good progress on the culture side of DevOps, the toolsets are not keeping up
  • Developer experience is the number one thing that every organization and every engineering team should focus on today
  • 20% to 30% of developer time is wasted on unnecessary activities, most of which can be automated and achieved using AI tooling
  • When working on the developer experience you need to constantly balance multiple factors including velocity, quality, resiliency, security, and cost


Shane Hastie: Good day, folks. This is Shane Hastie for the InfoQ engineering culture podcast. Today, I have the privilege of sitting down with Jyoti Bansal. Jyoti, welcome. Thanks for taking the time to talk to us today.

Jyoti Bansal: Good day, Shane. Great to be here.

Shane Hastie: So you are the founder and CEO of Harness, who are a CD company. Tell us a little bit about your background. What brought you to where you are today and a little bit Harness, please.

Introductions [01:03]

Jyoti Bansal: My background, I'm a software engineer by training, by profession. I build software systems, complex software systems. At some point of time, I was really frustrated by, that you can't monitor and troubleshoot what's happening in software. You build these distributed systems and things go wrong and you can't figure out what's happening. Back in 2008, I started AppDynamics, which eventually became one of the leading players in application monitoring, application performance monitoring, APM space. And AppDynamics, we sold to Cisco kind of famously a day before our IPO. We were about to just go IPO and a day before they came in and bought AppDynamics. And once AppDynamics was acquired to Cisco, I got really frustrated by another problem, which is that everyone is struggling with CI/CD, DevOps. We are all trying to ship software fast. We were trying to do that AppDynamics ourselves, and we are really not doing a good job at it. And I would talk to most of AppDynamics customers, and I'd hear the same story, that people were struggling with shipping software.

And that became the genesis of Harness. I was looking for the next set of major problem for the software engineering organizations that we need to tackle. And that became Harness, that how do we go and simplify software delivery processes and software delivery systems. So that's what harness does, Harness is trying to build a sort of an end-to-end software delivery platform where your developers write code. And then what happens after that from building and testing the code to deploying the code, deploying the infrastructure, verifying the code works, if something doesn't work, you roll it back. All aspects of software delivery.

What we are trying to do is, instead of you trying to cobble things together by scripting and hiring a lot of bespoke different systems together, putting them all together, that can you have a consistent platform to build a software delivery pipeline. And at Harness, we started with CD as the first part of that puzzle. There are so many aspects of like, how do you build and ship software? And you hear CI/CD and it always kind of bothered me that people would use the word CI/CD as one single word when there're two separate things. When you're in CI, you are building the code and making sure it's integrated with your primary code base. And CD, you are taking that whatever you changed all the way to your consumers and you're doing that actively multiple times a day.

And so the first thing we did was to decouple CI and CD and focus on the CD problem and really bring to market something that solves the CD problem of shipping code to production multiple times a day in a reliable, reputable manner. So that's where Harness started with, and Harness is a very strong CD product in the market. That's our core strength, but we have been also expanding beyond CD to now CI the next set of challenge, feature flags. And we also have a fourth module around cloud cost management and bringing cloud cost management into the DevOps CI/CD process.

Shane Hastie: What are the biggest challenges with DevOps today?

The biggest challenges with DevOps today [03:54]

Jyoti Bansal: Well, the biggest problem with DevOps today is the velocity is extremely high that is expected out of the software engineering organizations. And the DevOps model of like, there is a team of DevOps engineers. They're trying to bring together these kind of your delivery systems, CI/CD processes, pipelines, that becomes a big bottleneck everywhere because people are trying to home grow DevOps systems. DevOps was always about, you have culture and you have tools. I think on the culture front, we have made a lot of progress that people have embraced and adopted that you need to put an effort towards shipping software fast. You want to be nimble, you want to be agile, you have to be the right kind of architecture.

The tools front, we have not made a good progress. So because of that, most people, the only path to success is to build things yourself. I look at companies, like say, if you hired a VP of sales in a business and the VP of sales will say, I want to bring the best tools for my sales people. And they can go and buy the best tools and in a month they have the tools and they can now start differentiating on how to do the selling and the sales process, et cetera. But the tools are there, you can buy in a month and get going. That's not the case for DevOps, our software engineers have the best tools available to do their job. Their experience is the best. Let's get them there. If we are culturally ready, we are architecturally ready with microservices and cloud native, but let's bring the best tools. It's just not possible.

You have to build your own stuff and it takes two years. And then things change, you move from one architecture to another and then things break that you built. And that to me is a fundamental problem that from the next level, we need to get to the point where the way a VP of sales can buy best tools for their sellers and become at the same place as anyone else when it comes to tools. Or a VP of HR can buy the tools for HR and get going fast, we need to get to the same place for the industry to really move to the next level as DevOps, I feel.

Shane Hastie: For the engineers sitting in that space where they are pulling things together themselves and so forth, what should they be focusing on when it comes to maybe this is the developer experience creating that space?

The importance of developer experience [06:02]

Jyoti Bansal: I think developer experience is the number one thing that every organization and every engineering team should focus on. And developer experience starts with, how do people write code? How do they commit code? Then how long they have to wait for their CI and build to run. That's a big part of developer experience. Many of the times you would see engineers frustrated because builds are taking too long or builds are broken or builds don't work. Then once you start getting into like, once something is built, you want to deploy and the deployments are failing or deployments need to be troubleshooted, there is a big impact on developer experience. I think everything when people think of DevOps should be thought in terms of developer experience, is what I fundamentally believe in.

So a lot of the times when you look at the tools and the tool chain that you need to build out, that has to be designed with that in mind. How do you optimize for developer experience, that they can focus on innovating on the code they're innovating. And the tool chain after that is flexible, it's simple, it can work for the process that you want to do to ship that code. But at the same time, you also want to remove a lot of toil that the developers have to go through, maybe they have to generate a compliance report or someone is asking for an audit trail. We work with companies where people say like, we have to provide an audit trail of all the changes and deployments we did once a month. And that's a four day process for our engineering team to go and compile that.

So that kind of unnecessary toil creates a bad developer experience, like in developers are wasting their time. The data points that we see at customer base on Harness is about 20% to 30% of developer time is wasted on unnecessary things. Think of bill deployments, troubleshooting, compliance reporting, all of those kind of the toil that comes with that. And if you think of 20%, 30% waste, that's a massive waste for everyone. And no developer likes to do it. And how do we optimize that is how I think is how we should be thinking of it.

Shane Hastie: I'm going to challenge that and say from the organization's perspective, that compliance report is crucial, because without it, we could be shut down.

Find ways to automate and remove unnecessary activities [08:01]

Jyoti Bansal: Yeah. So that compliance report is crucial, but do you need to spend four days in generating the compliance report or that could be done in four minutes through some kind of automated way. It's really how you improve the developer experience. So that's where, right now there is so much pain and toil involved in these kind of things. Compliance report is essential, you want to even put more security gates and more quality gates and more your software supply chain protection and all sort of things. And the higher the velocity is, you'll need more protection. But the problem is, if you put all of that burden on the engineers to do through manual toil, all by writing a lot of scripts, the engineering experience becomes very, very complex.

Incorporate AI and ML into the DevOps tooling [08:41]

Jyoti Bansal: There is so much frustration in engineers now. It's like we spend so much of our time not just writing code and innovating on all sort of things, which is important. And that is how DevOps will work, you can't just offload it to someone else. But we need to give them tools to automate most of it and the tools not just automate the process, but also more intelligent tools. So that's why when I look at the next generation of DevOps tooling, it's all about intelligence. If you look at just CI for example, CI for 15 years have been about the same since first time Jenkins was written, it's mostly more or less around the same, which is like, you compile your code and you run a whole bunch of tests. And over time people have created a lot of testing and test frameworks and test suites.

You have, like I say, imagine if you have a code base of a million lines, which is not that big of a code base, someone makes 50 lines of code change and the 50 lines of code change, you have a test suite of 4,000 tests, which would be very common. Most of the times in your CI, you want to do like, let's test everything, you'll run a suite of 4,000 tests and see for those 50 lines of code changes, does anything break? That to me is not an intelligent way of doing it because that's causing the developer to wait for that 4,000 test to run to check everything. Can you use some ML and intelligence to figure out what are the important tests out of the 4,000 tests for the 50 line of code chain that just happened.

Tthat kind of intelligence allows for higher developer productivity. And so, at Harness those are some of the algorithms that we're bringing to the market. We have one thing called, which we just brought into market called test intelligence, which is about that. Can we learn from the dependency models between your code and your test tests? Which part of code should invoke what test and what are important and can we run them in the right order? Maybe out of the 4,000 tests, you only need 200 to run for the code changes you just did, that can bring down the time to test by 60%, 70%, 80%.

So now the developer has to wait for less period. So I would say, you have to achieve that. There's no answer to the let's not test enough. You don't want developers to wait, so let's cut down the testing. That's not the answer. Can we be smarter about it? Can we be more intelligent about how we do things? Same thing applies to compliance. Same thing applies to security. That we have to get more smarter because developers are wasting otherwise too much of their time in doing these things.

Shane Hastie: What does that smartness look like in compliance and security?

Incorporating smartness in compliance and security [10:58]

Jyoti Bansal: The smart list looks like in compliance and security, is around governance policies, which is, you have things like OPA, open policy agent, these days. Or frameworks where you can enforce policies. So what policies do, is they allow to create the guardrails. So you want developers to move fast, do whatever you need to do, but there will be guardrails that you cannot break. By definition, you're compliant because the guardrails define the compliance. So instead of trying to figure out where you compliant after the fact, like a week later or bringing a lot of change management processes in it, you can automate the compliance guardrails.

Let's say, unless you run security scans on your code chain that you just did. And as part of your security scanning, there should not be any new vulnerability of certain kind that was discovered. Unless you do that, the pipeline will stop and you cannot deploy in production. Can you automate that process? And if you automate that in an intelligent, smart way, you can remove a lot of the burden later on for people trying to manually figure it out. Did we create more vulnerability? Is someone generating a compliance report to give to the compliance team on what happened after these deployments? All of this stuff could be automated through a policy framework that you can make it part of your CI/CD process and the CI/CD pipeline that makes the pipeline compliant now just by that framework in place.

Shane Hastie: So if I am the DevOps engineer, thinking about the developer experience and wanting to improve the metrics, the outcomes for the engineers that I support. What are the things I should be focusing? How do I start?

Balancing velocity, quality, resiliency, security, and cost [12:34]

Jyoti Bansal: It would be a simple answer if there was only one variable, why the job of DevOps is hard? Or any software engineering teamwork responsible for an app. Because you have five or six different dimensions. You have the dimension of velocity, that is you have to ship fast, you don't want to slow down anything. You have the quality dimension, everything has to be high quality. There's a security dimension, that security and governance that people have to, you can't create a security gap. You have to have the compliance governance, all of that has to come in there. And then there is the dimension of resiliency, that everything has to be resilient. You have to be able to roll back fast if something breaks.

And then finally there is cost, and cost is also a very important factor. People used to not worry about it, like the DevOps teams. But now everything is running in a public cloud and the public cloud, you do a few things and your cloud bill can go up millions of dollars quickly. So it's like now you have to balance those five, your velocity, quality, resiliency, security, and cost. So you have to build things where like, your pipeline should automate most of these things. You have to bring more and more inside the pipelines and just automate that through like, what is your cost model? Let's say you're deploying new code or new microservices or new something, and new changes. Does your cloud cost go up? Or before you even deploy, can something predict and tell like, if you make this change, this is what the impact on your cloud cost will be.

And now, so you stop the change or you go through an approval flow and you just automate that as part of some kind of governance system. Instead of that, like how it happens today is, you'll ship something, a week later or a month later, someone figures out your cloud bill went up then you spend two weeks to troubleshoot why did you cloud bill go up? And then someone comes in and then they're putting the hammer down on the DevOps teams on like, let's go and fix this thing. But you could go and fix it right there as part of the automated pipeline. Cost is like that, your security governance, I already talked about the example like that. Now you could be much more intelligent in many of these factors and convert it to then metrics of like, what cost is allowed for what microservices, what team and then you give them the framework to operate in. What are the quality metrics? What are the security policies, governance policies?

Error budgets [14:42]

Jyoti Bansal: The more you can define things for developers in well-defined metrics and guardrails, the developers can operate into that. I'll give you example of the error budget concept that Google uses, where like there's an error budget that every team is given. Like, this is an error budget, which is like, you can have 10 minutes of, let's say, availability issue in a certain period, or you can have this percentage of your user interactions could have an error. And so now the team has a clearly defined error budget and as long as they operate in the error budget, they can go and make a lot of changes and do a lot of things. If you deplete your error budget, you have to wait. So I would say the more you can define these clearly defined concepts.

So now I would think of a DevOps team. It's great. If you give them the error budget and say, if you are meeting your error budget, you can go and ship your pipelines, can go and ship things and deploy things. But if you're not meeting your error budget, you have to slow down your pipeline and wait until you meet your SLAs and error budget requirements. Same thing applies with security and governance, same thing applies with cost. You just have to create a framework that's a little bit less ad hoc and more automated and more metrics and goals oriented.

Shane Hastie: Where do the Accelerate metrics come into this?

Using the Accelerate metrics [15:49]

Jyoti Bansal: Accelerate metrics are a great part of this, you have the time to ship, the deployment frequency, time to roll back or resolve if something goes wrong. All of those are extremely important and I think that's how we should be looking at it. Accelerate is a great framework and the metrics are a great framework of looking at how a software engineering organization is operating. I do think there are few more metrics there that are needed to complete the picture. Cost is a clear dimension, security and governance is a clear dimension that's not fully covered into those. And so it could be expanded into some management metrics.

Shane Hastie: Changing direction a little bit, talking about your own experience. You were saying to me earlier on that Harness has experienced phenomenal growth over the last year, you've gone from a 100 to 500 people. Now this has been over the period of COVID with lockdowns and so forth.

Jyoti Bansal: Not just last year, the last couple of years.

Shane Hastie: Last couple of years. So how has that growth been managed and how has that worked for you and for the organization as a whole? This is the culture podcast, particularly around retaining the organization culture. 

Growing an engineering organization through COVID-19 [18:48]

Jyoti Bansal: If you grow that much, when majority of the organization is new and it just came in. And during the time of COVID and remote work and pandemic, culture is definitely a challenge. You have to be more deliberate about it is the primary thing. What culture do you want? That's the most important thing. And then how do you make that culture happen or measure, is that happening or not? I do think the important part when you go through that kind of a growth, if you look at just from an engineering culture perspective, the main thing from engineering culture is modularization and clear responsibility and accountability. That's where you can break things into smaller areas, components, microservices, so the teams that are responsible for a certain service and with clear APIs. So APIs become your primary endpoints of how you operate and engage and how you define service levels.

What did that create? It creates independence. It creates accountability. It creates a way to measure things and people to operate independently as well. The second part is a lot of, I would say, measurability. Measurability in the process is very key. That people can know what are the metrics, starting from accelerate kind metrics, to your service level objectives, to your security governance, all kind of metrics that are needed that people have to operate in a framework of that. So then people can communicate around that framework. That is important. I look at the culture beyond engineering as well. When you look at the broader company culture, what kind of culture do you want? I do think for very high growth companies, the best culture is everything wide open and transparent. So you treat everyone as sort of equal partner in what you're doing and good, bad, ugly, you share everything. The advantage of that is like, it just creates much more alignment. There is no layers of information, hierarchy of anything like that. Whatever is good is shared, whatever is bad is shared, whatever is ugly is shared.

And Harness, we make it all the dashboards of everything that's happening in the company, all available to all employees. We share them, we show them, everything is open. It just makes it easier for new people to come in and get on same page because they all have access to the same information. And I'm generally a big believer, the more information is shared between teams, the more people know, especially in a high growth kind of environment, that it really helps the culture.

Shane Hastie: How are you tackling, or are you tackling, the remote versus in-person, the "hybrid workspaces"?

Tackling hybrid work [19:10]

Jyoti Bansal: We are tackling it like anyone else. Before COVID, we were not primarily remote work, work from home kind of company. We had a very strong, vibrant office culture. We had offices in San Francisco, Mountain View, in Dallas, in India, in Bengaluru. So we were mostly office oriented, but with COVID obviously, we became remote work. But we really embraced the remote work actually, we won an award for one of the best remote work companies, number two in the tech startup space a few months ago. And what we embraced is, it's great, we have access to talent everywhere, people have flexibility to work from anywhere. So we have embraced the work from anywhere for our culture. So no one is required to come to office. You can work from home, work from anywhere. But at the same time, we do provide offices as well for people who do want to come to office to have that social interaction.

So our rule is, in any particular city or a geography close by, if we have about 10 to 15 people who want to come to an office sometimes, we'll get them an office. So we still have an office in Mountain View, office in San Francisco and a few more offices we are opening there. So people, they want to come to office once a week or twice a week or interact. Or some people do prefer to come to office, so they have that option to come to office. So we don't require anyone to come to office, but people have that option. And people come in for social interaction, people come for happy hours, people come for team meetings, people come for brainstorming, whiteboarding, as they wish.

So it becomes a hybrid thing. And I do think that's how it would be forever. And there are a few elements that you have to become much more careful about when you are doing fully remote work, which is you have to make sure that the bonding between people and the camaraderie, there are forums that you create, even virtually for people to interact. When you're in the office, that just happens naturally, that you just have the non-work interaction that just happens naturally.

If you're on Zoom all the time, there is no time for non-work interactions at all. So you almost have to, as a company, you have to create those situations where something happening, which is non-work interaction that's there. The other thing that we had to be very careful and we had some challenges around it is geographical distribution. Now with people working from anywhere means that people are in different time zones, people are in different geographies. So how do you operate into that without creating a burden for people to communicate on a call all the time or on a video call in different time zones. So more communication that's asynchronous. We adopted this concept called squads, and every squad is in one time zone really. So the most of the work could happen in one time zone.

So we put, say a squad of three to five engineers, they normally would be in one time zone. So most of the day to day interaction they have, they don't have to do with engineers in another time zone. So it creates a rhythm. They don't have to do calls at early mornings or late nights. So those are the things you have to be careful about and we all have to adapt. And the good thing is I feel like organizations have done well adapting to this new world of how work would be.

Shane Hastie: We're coming to the end. Looking back over your career and you've been really successful. What advice would you give the new engineer and what should they be looking at?

Advice for new engineers [22:16]

Jyoti Bansal: My advice is, I take a lot of pride in the craft of software engineering. That's what my passion is. Building good software, building products that solve big problems and I've been an entrepreneur and a business person. But at the core of it, I really look at like, if I'm solving good, interesting problems and building good software to do that, everything else will happen. So that's the one advice I give to people. Don't worry too much about things. If you can build really good software, software that's solving some good problems that people care about, and you take pride in it, then everything can fall in place around it.

Many times I see software engineers chasing, I want to work on this stack, or I want to work on that tech, or I want to do certain things. I normally advise people as like, focus on a problem that you're passionate about and building good technology and good solutions and good software to solve those problems. And if you do that, and you take pride in that, monetization around that will happen and business success will happen. And your career success will happen because you delivered something of value. And that's normally my advice to a young software engineer.

Shane Hastie: Jyoti, thank you very much, indeed. If people want to continue the conversation, where do they find you?

Jyoti Bansal: LinkedIn, Twitter. My Twitter profile is @jyotibansalsf. And anyone can Google me and find me on LinkedIn.

Shane Hastie: Wonderful. Thanks so much.


About the Author

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article