BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Create Autonomous, Highly Productive Teams By Lowering the Stakes

Create Autonomous, Highly Productive Teams By Lowering the Stakes

Bookmarks
32:22

Summary

Jason Lengstorf looks at architectural and organizational strategies to help teams move with less technical debt or maintenance burdens.

Bio

Jason Lengstorf works as a principal developer experience engineer at Netlify and hosts Learn With Jason, a live-streamed video show where he pairs programs to learn something new in 90 minutes. He spends a lot of time telling people that the formula for success and happiness is to lift each other up and share what we learn. He is trying his very best to follow his own advice.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Lengstorf: I want to talk about how to build a unicorn. More specifically, how we can create really highly productive teams so that we can make our companies into what a VC would call a unicorn. I want to talk about how we do that by lowering stakes. The first thing that I want to talk about is, what is a unicorn? What do we mean when we call something a unicorn? Specifically, when you hear a venture capitalist call a company, a unicorn, they're usually talking about a company that is valued at over a billion dollars. I think there's another way of describing a unicorn, which is, when you hear about these companies that achieve like an almost legendary status, everything they ship is amazing. Their teams are amazing. People would do anything to work there. They just get this ethos around them where everything they touch turns to gold. We've seen companies like this. Steve Jobs era Apple. The early days of Google. To a certain extent, like Apple now and Google now. These moments when these teams are just unstoppable. That's another way of looking at a company being a unicorn. That's something that I think we really want.

If we think about what makes a company a unicorn, it boils down to people. Because you can have bottomless pockets, you can have access to all the talent in the world, but if you're not able to get the right people and the right ideas all working together and doing things well, then it doesn't matter. We see companies with deep pockets, blow it all the time. They miss the market. They just can't ship. They don't get something out in time. They get beat to the punch by a team that's delivering more effectively, and they just fade into irrelevance. I think instead of saying a unicorn, maybe we should talk about what makes a team work, because these are the things that are actually going to make the difference.

What Makes a Team Work?

Let's talk a little bit about this. The best teams are going to ship fast. They get things out the door quickly. We're moving, and everything that happens is happening at pace. You don't see these big slowdowns. You don't see people get stuck for months between delivery. They're getting things out the door quickly. The best teams are also working together. You're not seeing this siloed off or very isolated, unfriendly working environment. Instead, you're seeing these highly collaborative, you can sense that this is a team that has gelled. They've found a way to really work together and to be a unit as opposed to individuals working towards the same goal. That really shows in these unicorn teams.

Ok, But How?

What I want to talk about is not just the abstract of this, what could be or what should be a great company. Instead, I want to get into a little more brass tacks, so how? How does this happen? What makes a team? What can we do? What do we actually control that allows us to get these wonderful teams put together and functioning? To talk about what we can do, I think it's also important to talk about what goes wrong. We should talk a little bit about what causes teams to fall apart. How do we kill unicorns? How do we get ourselves into trouble and prevent ourselves from being able to do the things that we want to do. I think that a few of these end up being architectural, and a few of these end up being strategic.

Software Architecture

Let's talk about software architecture a little bit to start. The first thing that I'll say is, we've all had a moment where we said, let's just put all the code in a big pile. Unless you've got some really rigorous tooling around that, and some really rigorous documentation and safeguards, then you end up with something that feels a little bit like this. You walk into a room, or into a code base, in this metaphor, and you're looking for something that you need to fix. You find yourself over in one section of the code base, and there's tens of thousands, or maybe millions of lines of code. The thing that you're looking for is actually way over here, and you have no way of knowing that. You have to go ask questions in a public Slack channel and hope that somebody who has enough context notices and can point you to the right person. It demotivates people. This gets messy. It gets hard to track. It gets hard to maintain. It really tends to just like drag a team down.

The next evolution of this was we said, let's ship microservices. We'll split the code up by its product domain, and that is going to give us the ability to do less. We don't have to keep as much in our heads. We don't have to manage millions of lines of code. We're just going to do a little bit at a time. The problem is that this only tends to divide the app vertically. What that leaves us with is that if we've got frontend engineers and backend engineers, and we have a team that is all frontend, and a team that is all backend, and each one of them has a microservice they have to maintain, we end up with this problem, where, to misquote Carl Sagan a little bit, "If you wish to make an edit to the CSS, you must first install Docker."

It was a really rude awakening for me when I went to my first microservices based company because I joined a frontend team. I was specifically hired as a frontend engineer. When I went onto this team, the first thing I had to do was install Docker, install NGINX, go into a bunch of config file, start running some backend scripts, and then configure some database. That was all just so that I could get a development environment up so that I could make my first commit to the code base, which was to tweak some of the styles. That type of complexity just to get up and running, that's brutal. I came in where I used to do backend code, and so I maybe ramped up quickly. It still took me the better part of a day or two, to get my development environment up and running properly after I'd chased down all the right people to get the environment variables and all the things that I needed to get my microservices running on my development machine.

Complex Code Bases

Then, that type of complexity, that everything is in a big pile, or we've got microservices with this setup complexity, it leads to fragility. That fragility leads us to say things like, only Steve can merge pull requests. Steve's got all the context, and so we're going to make sure that he's looked it over and that it's all ok. Then to make sure that we don't keep our DevOps team up all weekend, we only deploy on Thursdays. We only do one deploy a week. It happens on Thursdays. If you don't get your code in by Thursday, it's going to wait until the next week. Both of these things are done for safety purposes. We want to avoid risk. We don't want to put ourselves in a position where somebody who maybe doesn't have the experience with our app will push something that accidentally breaks production. We can't have that. That's not a good thing. Then only deploying on Thursdays, like that makes sense if your code base is really difficult to deploy, and really difficult to recover if something goes wrong.

All of that puts us in this weird situation where now we've got this gatekeeper. It's not done because we don't want people to be able to work, it's done because we don't want to break the build. It still puts us in this situation where it feels like, I have to ask permission to do my job. If I have to go ask somebody, can I merge? Then after they merge, I have to wait until Thursday to see the thing that I built go live. Then I find a bug on Friday. Then I have to wait until next Thursday to get my merge in to fix that bug that I created. That's such a frustrating way to work. You have no way of getting things done quickly, because at best, you're going to be able to ship once a week. That doesn't feel good. It doesn't feel fast.

Then the other problem that you end up with when you've got these complex code bases, and you've got a lot of contextual knowledge, is you end up hearing things like this, "We don't touch that code, because we don't know how it works after Laura left." It can feel a little bit like you go in and you look at all these patch cables, and you don't know what things do or why they're there, or what they're meant to be. An actual quote that I heard at one of my previous jobs was, "As far as we can tell, this code is never actually used, but everything breaks if we delete it. I'm sure that code was used somewhere, we just couldn't figure out how or where or what was referencing it." When you've got a million line codebase, you don't have time to do that detective work to trace something all the way through all million lines. Even if you do it for one thing, there's the other 999,000 lines that you got to dig through. There's no way that you can build that level of context. It leads you to this giant pile of legacy code and you build around it.

Instead of working on the code, you figure out what the edges of the code base are, and you add more to it. It becomes this mud ball, where the center is the old stuff. Nobody knows how that works anymore, because that was built three generations of teammates ago. Then there's the extra layer and the extra layer, and you need an archaeologist to get through what happened. What we're doing now to what we were doing six months ago, to what we were doing, whoever was here before and since left. It doesn't feel good. It doesn't make you feel confident in your ability to deliver or ship, and you end up in a situation where you're just really dragging. It's a sloggy way to work. We don't really want to do that.

Strategies to Unlock your Team - Design for Deletion

Instead, let's talk about ways that we can unlock our team. Let's take all those bad things, all those things that are dragging us down, and let's look at ways that we can turn that around. How we can avoid those problems in our code bases, both architecturally and organizationally. The first one is, I want to talk about designing for deletion. This, I think, is one of the most beneficial things that we can do. If we think about our code as a problem that is being solved, and we think clearly about how those problems are solved. There's a lot of metaphors in here, you could think about each piece of code as a pure function, where the same thing comes in, the same thing goes out. Or you could think about it as a contract. You've got a clear API contract we will always give you, if you hit this endpoint, this shape of data. Any of that clarity that you can create means that you now have a clean separation between that piece of code and anything that uses it. If you have a single API defined, and that API has promised every app that uses it, that if they hit these endpoints, they will get this shape of data. That means that later, I can go in and I can delete that entire API and replace it with something new. Maybe we decide that we're going to take this old PHP API that was built by a previous employee, and we want to move it over to Go, but the contract stays the same. It still exposes an endpoint. If you hit that endpoint, you get the same shape of data, and that contract is still fulfilled. All the services that use that piece of code, they never need to know, they never need to be updated to make that switch, which means you now have the opportunity to maintain each piece of code as an independent entity.

This is a really reduced example. This is some real code that I pulled out of a Netlify project. This is the way that we are loading comments for something. As far as the UI is concerned, this is a serverless function here, it's going to call this serverless function, and it's going to get back comments in a certain shape. The serverless function is using a getComments handler, and that getComments handler under the hood can be using anything. It doesn't matter. This app has no idea how that data is being loaded. It doesn't care. It just needs to know that if I get a URL, and I pass that URL to getComments, I'm going to get back the comments in a way that works for this UI. Right now, this is using like Hasura under the hood. If we switch this over, and we want to just go straight to the Postgres database and use connects or something, we can do that. This app never needs to change. The code doesn't change. That getComments utility is the only thing that gets edited to swap out all of the logic in this app, and the UI that uses this API, it never knows at all. It's still hitting this endpoint. It's still hitting this handler. It's still getting back an array of comments. This is a reduced example, but it shows the potential of this approach where we are able to design things in such a way that we can hot swap code bases in and out as long as the boundaries are clear, without having to rewrite the entire system.

Let People Be Experts

The other thing that this allows us to do is it allows us to let people be experts. I think one of the things that's hardest when you are working on a piece of software is that thing I said about when I joined IBM and I had to set up a microservice, I had to do my Docker containers, and my NGINX config, and my databases, so that I could run a frontend. That doesn't feel like me working in my expertise. That felt like me working way outside of my expertise. That left me in a position of feeling less confident about my ability to contribute to the code base. If we design for deletion, what we're able to do is we're able to let people stay in the thing that they feel most confident about. That lets people really ship and push things quickly. It also lets us get away from this idea that everybody needs to be able to do everything. It makes our code feel a little bit less like a Jenga tower. We feel that way because we're not confident that our developers are doing a good job. We don't trust them. A lot of the times that trust isn't because we don't think they're good developers, but because we know they're out of their expertise.

If a frontend developer is writing backend code, yes, you probably should check that. If a backend developer is writing UI code, you should probably have somebody check that too. A frontend developer writing frontend code, or a backend developer writing backend code, that's where they live. That's their sweet spot, their expertise. We should and can trust them to do that job and to ship that code without a lot of oversight. If we give people the opportunity, architecturally, to work on the things that they're experts in and not have to deal with the things that they're not experts in, we get a lot of trust by default there. Because we hired you to be good at this, so we can trust you to be good at this. If we hired you be good at one thing and we're asking you to do another, yes, we probably need checks and balances there. That's an organizational problem. That's something that we can solve and give our people more trust.

Minimize Your Boilerplate

That comes down to this idea of minimizing your boilerplate. I worked on a few projects where every single time that we would stand up a new microservice, we also had to stand up a brand new Express API. That Express API needed to have certain bits of proprietary middleware in there for authentication, and session management, and checking for all of these specific headers and cookies. A lot of times, all of that boilerplate was identical. It was the exact same boilerplate. We were basically copy pasting another microservice. Then we would go in, and we would edit a few routes. Most of these had one, maybe two API routes that were being exposed by this Express boilerplate that we were writing. It was like 90% boilerplate, and just a few lines of business logic that was required for this thing to run.

That made it really challenging because then we'd have some security overhaul, and now we had that Express boilerplate times 35 microservices that we had to go and edit, and check, and do compliance on. That is not what I want to be doing in my job. That's not what anybody wants us to be doing. I'm assuming a lot of people watching here are in engineering management or engineering leadership, and you know what it's like trying to get cycles for maintenance. It's impossible. You have to sneak it in. You've got to basically Trojan horse maintenance into feature work. If the code base is built with mandatory maintenance to do new features, that is not a fun way to write code.

A way that I like to think about that is like, if I want to make pasta, the last thing that I want you to do is hand me flour and eggs and say, "Here you go, there's your pasta." It's going to take a little bit of elbow grease. That's not what I asked for. It's not an efficient way to do it, and it's definitely not a good way to establish good results. If every single person on the team has to make their own pasta, you are looking at a situation where you're going to get wildly different quality, even if everybody is following the same recipe. If you've ever been to a cooking class, you know that doing the same thing doesn't necessarily result in the same outcome. That boiler plate, even though it's the same thing, there's still the potential for things to get weird and go sideways, or for a small misunderstanding to lead to a lot of production problems. We want to get that boilerplate out of there.

Give People Autonomy

We also just really want to give people autonomy. The worst feeling in the world is being stuck in a position where I feel like I can't do the thing that I need to do to move the needle. It makes me not want to do work. If you put me in a position where I can't control what's going on in my app, then as a protection mechanism, I start rationalizing, I'm just going to focus on the things that I have to focus on. I'm going to let this go. That's no good, because we're basically training people that something needs to be this on fire, before we're going to do the work to fix it. Because it's so hard to get things fixed. If we flip that and we give people lots of autonomy, then every time that they see something that could be fixed, they know that they have the authority, and they know that they have the power to fix it quickly. We want to make sure that that's there for them. We want to make sure that they feel the permission, the autonomy, the capability to go out and make those changes in our code base. That nobody is going to second guess them or undercut them, or that they're not going to hit gatekeepers that would prevent them from getting that work done. Ultimately, we want to establish trust on our team. The number one currency on a team is trust. If you don't trust your team and your team doesn't trust you, that's it. If you don't fix that, it doesn't matter how good all of your other processes are, because you can't work without trust.

Tactics - Make Your Frontend Static

Let's talk tactics. Those are philosophical things that are a little squishy. Let's talk about some concrete ways that I've seen this done. This is what I implemented when I was a frontend architect at IBM. This is what I've put in place in every situation I've worked in since. The first one is to make your frontend static. What we want to do is we want to make sure that the frontend is a static set of files. There are a lot of reasons for that. The first one is, a static frontend is atomic. When you build a static site using a set of files, you have the code base, you run a build command, and you get a folder. That folder is what you deploy. Every time you build, you get a new folder, which now means that you have the ability to time travel, because each build is a point in time. If you don't like something about that build, you can just roll it back. You can say, we didn't like that build, let's switch to the previous build. We just swap which folder is being served. That's that.

A static frontend is also scalable, because it's static, you're serving it entirely on CDNs. The hardest thing to take down is a CDN. They're specifically built for high traffic. This is the way to guarantee that your holiday traffic spike when you do a big Black Friday launch, that it doesn't take your site down. This is a great way to make sure that your SREs aren't being paged at all hours because of overload outages. It's also really secure because it's static files. If you generate static files, and you don't have client-side calls to the database, there is no server required for this site to work, because you're generating static files. For a lot of stuff, marketing sites, blogs, a huge amount of content driven things, you don't need any server activity to make that work. Then for more server driven things like eCommerce experiences or apps, you can serve all of the basics, like the product catalog, or the structure of the app dashboard, statically. Then you use client-side JavaScript to load in the inventory or the in-stock, out of stock notifier, or the users' dashboard data.

All of those things are still completely possible. You can see a few great examples of static architecture at work. The Peloton app is completely built as a static app or a Jamstack app. Netlify is built on Netlify. The whole app is deployed as a static app. It's a static frontend using APIs. You can do some really complex heavy lifting with all static frontends, and because that static frontend is just static files, if somebody gets access to your CDN bucket, and they go in and look, there are no secret keys in there. There is no database access because the database was only accessed during the build. You can get defaced but you can't get hacked. You're not going to get a user breach through the static files that you're serving. You've still got APIs. You've still got databases. If you're hitting those things, people are still going to attack those, but it removes the actual serving of your site's HTML as an attack vector. It's good if you want to let frontend developers be frontend developers to not require them to think about server security when they're serving those static files.

Tactics - Deploy On Every Single Merge

The other thing that we really want to do is we want to deploy on every single merge. No more deploying on Thursdays. If you're looking at your sites as being atomic and easy to roll back, and secure, and they're isolated because it's just the frontend and all the API stuff is happening through this clear API boundary. Somebody else on the team reviews and approves. It merges. You can auto build that and just deploy it immediately. Because if something is wrong, you just click one button to take it back and it's instantaneous. That is a really confidence inspiring way to work. It also, because of that easy deploy process, it opens up some really nice workflows for experimentation. Because you're able to build on every merge, you can also do things like set up a branch for experimentation, and just deploy that, and take a look. You can deploy every single pull request to take a look at it before you merge and make sure that it's something that you want to merge. It's a great way to unlock teams, to experiment, to try things, to share things around so that you can really rapidly iterate and get feedback on things.

Tactics - Reducing Boilerplate With Serverless

We also want to reduce our boilerplate with serverless. I said on that previous project, we were writing all this boilerplate. If we switched over to serverless, that same code that we needed, which was just a couple lines of business logic, we could only write and deploy that. All of that Express logic goes away. If I need to, I can write a utility to check the Auth, check the cookies, and share that between my different serverless functions. I can do that in an npm package, so that I just get to say, grab compliance policies from our private npm repo. Then, in the serverless function, check the request for compliance, and then return whatever business logic. We can really compress a lot of that and remove a huge amount of complexity.

I like Netlify, not just because I work here, but because you deploy your serverless functions by putting them into a functions folder, and that's it. That's really nice and really empowering because you can build lightweight APIs and servers through serverless functions without ever having to touch config for it. That's a really empowering thing for teams, because they're able to not just build frontends, but now they can build frontends and lightweight backends. That maybe if they would otherwise need to ask for a bespoke endpoint, then they can just do that in a serverless function instead, without having to involve a backend team or pull somebody off of another project to help them unblock real quick. It's a really powerful model.

 

See more presentations with transcripts

 

Recorded at:

Jul 11, 2021

BT