Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Understanding Platforms: What They Are, Why They Work, When to Use Them, How to Build Them

Understanding Platforms: What They Are, Why They Work, When to Use Them, How to Build Them



Hazel Weakly discusses platforms and platform engineering, and what it means to learn, and how collective thought scales across a team, an organization and an industry.


Hazel Weakly spends her days working on building out teams of humans as well as the infrastructure, systems, automation, and tooling to make life better for others. She’s worked at a variety of companies, across a wide range of tech. Hazel currently serves as a Director on the board of the Haskell Foundation and is fondly known as the Infrastructure Witch of Hachyderm.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Weakly: My name is Hazel. We're going to be talking about platform engineering, what it is, why it works, how to do it, and when to do it. It's going to be from the perspective of figuring out these three categories of like engineering ICs, managers, and executives, and how to blend all these categories together. Given that I've been all three simultaneously, which I don't recommend, but simultaneously, it is something that I have a perspective on. We're going to go through all these things.

What is Doing?

First off, what you have to do is figure out what does it mean to do things. The reason you have to go here before actually even talking about platform engineering is because platform engineering scales, doing it scales other things, and you just can't get there until you know what you're even doing, as a person, as a manager, as whatever. Like this systems engineering, is what you're doing. It's through technical work. It's through people work. It's through systems work. You have to understand how that works. Because as a company, one of your main challenges is always going to be to figure out how to scale decision making. When you have that hyperscale, it changes the structure of the entire company, when you go above a certain threshold, it changes everything. You're constantly going through that figuring out phase of how do you get things done. When you've figured that out, you have this magical period of about 25 minutes when everything works and then it's broken again. It's about collective doing and it's about this learning. You have this cycle, this flow of the experimental hypothesis and the validation. You have these market spaces and this scientific experiment, this philosophy, blending everything together. As you do that, you take all the information you learn, and you dump it in the middle in this context spot. This context bucket becomes very large, very quickly.

Pretty much the entire science behind everything is figuring out how to split all that up. If we take these actions, if we take this doing and we split it up into three things, the first one is motivation of like, what motivates you in order to actually do things, in order to get things done? For engineers, it turns out it's mostly this thriving aspect of, do you feel psychologically safe? Do you feel empowered? Do you feel like you can actually learn things? Can you ask questions? Can you do all of these things? That gives you that motivation to actually do things and to learn. For managers, it turns out that it's different. Managers and ICs tend to disagree most when it comes to the definition of productivity. They tend to agree more when it comes to the definition of quality. If you take the word productivity, and you throw it out the window, you're a lot more productive. When you think about the outcomes, and you've planned the outcomes together, you're going to lose the ICs a bit, because they don't really care about the outcomes, they care about that we're going to thrive in. You're going to lose people, again, when you get to the executive level, because it's all about alignment. Yes, all these things exist at every level. The way people talk about, and the way they shape and the way that work, secure these through, means that it's so hard to actually sound coherent to these other people. Like when I started becoming an executive, I could see in real-time just the light draining from my friend group, I was like, you just sound like business now. It's like, you lose this context, like you just lose the ability to interact with people.

It's the relationships that actually make the platform engineering make the doing work. You can use something like Dunbar's number and just think about these social groups. When I'm doing things and I have to figure all this out, if it's just me, it's great. I'm my own genius. I figure things out. I always agree with myself most of the time. It's great. It's awesome. It's excellent. Three people is like, so you can figure out of like, how much you can get done on one team call without ever having to write notes, which is great. The definition of like the basement dwelling startup is just I order one pizza, like forget the two-pizza team, one pizza and a beer, you're good. For every multiple three, you're going to have a different structure, a different dynamic, a different challenge when it comes to scaling learning, and scaling doing, and collaboration. The effectiveness of this collaboration model has to take all these things into account. The effectiveness of the platform has to take all these into account. Your answer will be different every time you change the size of the company. Ideally, collaboration is going to happen when all these factors line up. They never will. It's great to pretend that they will, they never will. Ideally, you have this. You're always going to be in a structure of a slightly unideal environment. How do you compensate with that? How do you work with that? It turns out in this doing, there's another set of factors, which would be like time and stability, and they're less relevant than you think. Time, we think of teams as having like a life cycle. People want to like have a stay for certain teams, which is the team that's staying the whole time through. It turns out that like teams aren't actually in these static life cycles, they just pass back and forth between norming and storming and forming. All these different aspects of a team, is just continually there. This continual evolution happens inside the teams, they're never stable, they're always managed stable.

Think of stability. We like stability because it lets us be lazy as executives, it lets us be lazy as managers, it lets us compensate for a whole bunch of socio-technical factors. It's cranking and pulling up this lever of, I don't need to actually worry about making a safe environment if I just don't change a bunch of stuff. Then it's fine. You can't do that, because nothing is actually staying the same. You need to figure out this managed stability, which is a whole lot harder. We're going to go back and take that giant context file and start slicing it up, and that is going to die into the platforms. You can think about this vertical context and this horizontal context, and how they blend together with the vertical context being the overhead required to get something done and to do it. The horizontal context is that overhead required for collaboration. They're both required, they're both synergetic with each other. You can't really have one without the other, but you also can't have both. To put together this decision making, to scale it, and to actually make it blend together, you need to shape the motivation. You need to understand relationships and how they work. You need to scope the context down. Overall, building that stable base of knowledge is how you scale collective learning as a society. At a smaller scale, it's how you do it at a company. Another thing that I'll say is that limited context is a feature, not a bug. I used to say that ignorance is actually a feature of like humans. Then people said, that's me. Now, I say limited context is a feature, not a bug. It's very true. When you have this limited context, you're never going to actually have another context, and forcing yourself to split it up because like human brains are finite, actually turns out to make it work a whole lot better in general. You should split up that context more often in the first place. Bringing all together, systems engineering through technical work is for ICs, and that's like how that doing works. For managers, they're going to be systems engineering through people work. For executives, it's systems engineering through systems work.

What Is a Platform?

Now that we've gone into what is doing, we're going to figure out, what is a platform? Platforms are a way to scale doing. They are a way to actually go through and say, here's this cluttered knowledge, here's this giant bucket of everything, here's what we're doing, how can we keep scaling that up? They're one method out of many, and they're a particularly effective one in the right situational context. Another way to define that is that for ICs, you can think of platforms as like the dual of libraries, they're synergetic, you're going to pile them on top of each other, ad infinitum. They are a dual to libraries, and how they work, and how you combine knowledge and abstract and encapsulate it. For managers, you can think of platforms as turning a service team into a products team. You have this service, you have this kind of thing that scales in a certain way, you invert how all that works, and you build a platform. For executives, you use platforms to derisk diversification by exploiting the economics of agglomeration. We will get into that.

Platforms, you might think, so what are some examples of platforms? We have all these things. There are also platforms like, one of my favorite examples is regex. It is a platform. I will die on that hill. It is great. It is awesome. You should maybe not learn it. It turns out a lot of things that are platforms are also not platforms, like platforms. Platforms are not platforms, because being a platform is not an intrinsic property of anything. You can't be a platform. It's more than intangibles. Artistic, it's creative. It's hard to define, but it's not a platform. Another thing that I'll say is that platforms are not platform engineering. There's something magical with platform engineering, because it's an approach. It's a mindset. It's a philosophy around that clustering, and noticing the duplicates of work, like how things are structured, and bringing it together in a way that it can be reused and reutilized. You don't have to build a platform to do platform engineering. You can build a platform without understanding platform engineering. If you do that, you're going to fuck it up. If you do platform engineering, you may not end up with a platform.

Speaking of libraries and platforms. Libraries are one way to represent and abstract and think about knowledge and share it and make it easier to use. Platforms are another and they have many topologies, they have many combinations and similarities. You will often find that you are building one, and then building the other, and then building one and then building the other, and to smash them together in incompatible soup. A great example would be, you have Kubernetes. It's a platform, but you also have a library for implementing operators in Kubernetes, which is, you use that to build some other random stuff, and then you build more things on top of it, then you put more things on top of it. Then you have a vendor, then you start like stopping at 3:00 in the morning, because something's wrong and you have no idea why, and you're just 17 layers away from systemd namespaces. Platforms can also turn services into products. This isn't always good. It's not always what you want, you should be intentional about the choice. You're always going to need both. When you need to do that, it is an invaluable and excellent tool for doing so.

Just like libraries are the tool, are platforms and vice versa, they're also the building blocks of platforms. Platforms synergize with executive alignment, and team dynamics. If you have these sorts of aspects of how people work, and how everything is going, you can utilize platforms to help make that work. They also operationalize libraries in the same way that programs operationalize policy. If you build out policy as an executive, you're going to operationalize it by implementing a program that built out. If you're not careful, you will turn that program into a policy and to build another program on top of it, then another program, then another program, and never get anywhere. If you do it effectively, you're going to actually get something done. Which brings to mind a spicy topic of, what is platform engineering versus SRE versus all these other things? DevOps is a culture. It's whole-system ownership motivated by broadening the horizontal context. The whole idea of shift left is just you widen your horizontal context and you consider personally more things to be under your scope. SRE, when you've broadened everything, and you continue to widen that context to the point that it's inhumanly possible for any one team to do it, you invent a third team, you stick it in the middle, and you're like, I solved the problem. Then platform engineering, on the other hand, would be an empathy-driven approach towards socio-technical organizational design. They are all complementary. You may need one, you may need the other, you may need both. They're all complementary. They're all synergetic. They're all part of the same thing. Platforms, they can be the dual of libraries. They turn service teams into product teams, and they derisk diversification.

What Makes Platforms Work?

How do they work? The things that make platforms work are when you build an abstraction that's a force multiplier. When you build an interface between a team that reduces cognitive overhead, and when you build a consistent language and mental model for that aligned learning, and that aligned knowledge growth. The last one is really tricky. I'll get into that one. It's what happens when you need to actually give people the tools not just to learn the next stage, but to make sure that as they continue learning, you can leave them alone, and let them do what they do best, and then come back and find that they haven't drifted beyond recognition. When you think about the typical organizational structures at a company, and figuring out like, how to map platforms into that, and mapping all this other stuff into that, it's all about like that context. If you take the context, and you go, what's your functional organization? Way back in the day, we didn't have platforms, then really, we didn't have this problem. Everything was simpler back when you had just like four languages, and all of them sucked, and so you just used Java. You have these vertical slices. You have these horizontal slices. The vertical ones are there, the horizontal ones don't exist yet. You can see that the value stream is not related at all to what you're doing here. You have like the developers in one corner, you have the IT in another corner, you have the legal department. Nobody talks to each other. It's great. The vertical thing is right there and you can get everything efficient. Maybe you never talk to each other but it's fine. Later, some people are like, this idea of aligning the value stream to the actual people and making sure like all that's one loop, will be great. I love this. This is awesome. I'm going to call like product led growth. I'm going to call like product-driven development, things like that. Every time you have a new product, you're just like, you take a whole new team, you stick it in there just like, let them fight to death, and then you call Amazon.

Eventually, at some point, you're like, maybe that isn't complex enough. Maybe I can make this more complicated. You're like, I'm going to invent a matrix organization, because I want all the benefits of having these silos with all the downsides of having silos. It does actually work, but it's also really complicated. Because, like I said, you have this vertical and horizontal context, and everything like that. When you have this context, and you take these different designs, you're going to notice that the reason people keep making things more complex in that organizational structure is they're all fighting for that one thing, which is the market agility. The market agility is used to make these people go, this org chart isn't complicated enough. That one is the one where people go like, I need my business continuity to be incomprehensible so the investors will buy it. That one is the one that motivates this progression of the dynamic interfaces of teams, dynamic everything else. When you get to a certain level of complexity, platforms become more of a requirement, rather than a further way to scale things. It becomes the table stakes of just figuring out how to understand what you even are doing.

Speaking of, what are you doing, people keep trying to cheat the matrix. Sometimes they'll go, I'm not a matrix organization, I'm not whatever. If you feel like you're not a matrix organization, and you're just going, if you find yourself in this picture, you may be entitled to financial compensation. You need to build a stable base knowledge. You need to actually scale this collective learning as a society, but if that requires managing that context, like I've said, at some point, it's just going to become incomprehensible. Every time we try and slice that context, we make it smaller and more manageable, but we also make the pieces harder to find and harder to glue together. Then, that's another set of context, and then that one's even bigger. When you think about all these things, and you start trying to go, ok, like this context thing, you're like that seems to be the hardest one to solve. Flow state, that's the hardest one to solve, in that the problem is usually obvious, but no one wants to solve it. That's the one where you go like, if that's psychological safety, where's that? Do we actually let developers do things that they want to do? People can find the problem. They can't necessarily solve it. The cognitive load, this is the one that's very difficult to diagnose the problem, because it's the hidden work. It's the invisibility. It's the really complicated structures that prevent you from doing things.

When you think about shrinking that context, and making that cognitive overhead more manageable, the ways you tend to typically approach the problem are, as an IC, you have this idea of a well-defined abstraction. As a manager, it's the thoughtful interfaces behind your teams. As an executive, it's that consistent language and mental model. If you remember, this was the definition of a platform, where similarly start to come together. Well-designed abstractions are both transparent and opaque, which is one of my favorite sayings because no one can argue with it. You get there and you're like, what does that mean? If you have a transparent abstraction, you have one that can be reasoned and seen through, you can always reason about the platform and the underlying abstract thing as if the abstraction was not there. If it's opaque, you can also not need to understand the whole thing, you don't need to actually understand the whole platform, you can still reason at the level that the abstraction provides you. That is, if you have both those properties, you can avoid the 200% knowledge problem of needing to know the abstraction above and below and the whole thing and combine it together. You can avoid the problem of people only know how to do a certain thing in one way, and the people that need to be onboarded can't be migrated. Having a well-defined abstraction is great.

At the team level, that's not necessarily something that works. What you really need to think about at the team level for the context is the thoughtful interfaces between a team that results in those collaboration models. Because every manager has at some point, encountered a situation in which your team is doing a bunch of useful work and getting nothing done, and you're like, where does the time go? They go, I was working on all my stuff, and then also dealing with everyone who's messaging me. If you have this inefficient or poorly designed collaboration model, or these interactive methods or these interfaces in the team structures, and the inter-team collaboration, things aren't going to work. For executive leaders, one of the most important things they can do is provide this consistent language, this consistent mental model of taking people and giving them tools not to dictate how they think but to give them a stable path to grow, change of thoughts, to grow abstractions, to grow knowledge, in a way that you can keep them aligned, in a way that's understandable to people going forward. That means that you don't have to micromanage it, you don't have to constrain it. You can still allow the emergent innovation to happen without making it happen in a way that people should start speaking vocabularies and languages and approaches that don't mess with the rest of the organization and prevent it from actually happening. When you think about all these properties, you get platforms. These are some of them, there are more, maybe not all of these are there, maybe all these are there and more. What makes platforms work? You have that abstraction, you have the interfaces, and you have the consistent language that enables aligned teams. You don't necessarily need the good properties, but you do need some of these.

When to Platform

When you have the platform, when do you do it? When do platforms make sense? One of the difficulties of integrating technology and enabling a technologically-driven organization is that you have two different societies in each other, and they do not mesh well. Technology is a post-scarcity society. Companies live in capitalism. Until you understand how damaging and debilitating it is to try and cross that chasm, you're just not going to understand why your technological teams like hate you, or why things aren't clicking. This focus on this overhead, the root starts there. If you think about like, the economic model. It's like what made sense, what works here. The economy, if you take a functional organization, an economic model for optimizing, or reasoning about it would be like the economic model of scale. For a product organization, it will be like the economics of scope. The idea of the value generation there comes from the reuse of the overhead when diversifying. You keep adding more verticals, but at some point, it gets cheaper to add them, you hope. Then you have the matrix one, this one's interesting, it's the economics of density. The optimization and the efficiency for a matrix organization doesn't come from any inherent property of the matrix, it comes from the fact that if you stick a bunch of stuff close together, it may be more convenient. Which is why people don't like matrix organizations.

What does this give you? You have to go with the model that's awesome. You have the market agility. People love market agility. It's the whole reason you make your org chart complicated. You also have the business risk, and you have the continuity, you have the growth strategy. You're going to end up with multiple of these scattered. Maybe this one's a vertical. Maybe this one's a horizontal. Maybe this one is a combination of a bunch of things. Where does platform engineering fit into there? It's about figuring out those clusters and blending them together, and really getting that and identifying where everything is and understanding all of that. If you take this platform economics, you can think of like libraries as maturing that market development, and derisking the development of sending products into new markets. Platforms can derisk the development of new products into the same markets. The interesting thing is that platform engineering can help you derisk diversification. It's not going to be in a straight line. It's going to be kind of like a staggered line, sort of application with a math thing. It's going to derisk that effective diversification and do so by combining the ideas of libraries and platforms together.

If you look at this, you might think, this agglomeration thing, this is where we start to talk about it. The agglomeration is like the economy, and the markets of like this clustering. If you look at your organization, you look at the structure of it and the clustering, this is what you're looking for. You're looking for these buckets of teams or buckets of services, buckets of all these different things, and you go, ok, like these need to be moved over here because they work on the same thing. Then if I have a sufficiently large box of all these different teams, or all these different services, or all these different things, you can figure it out. You can go, this is an opportunity for the platform. Branches can be seen as like a code base. Whichever model makes sense for you, try and use it. Maybe you use four, maybe you use one, maybe you use the other. You're looking for those clusters, and those are the opportunities of where you may be able to platform, that's where a platform makes sense.

When to do a platform? I like math, so I built this formula. I've never seen anyone able to use it, because it requires people to know things that are impossible to know, however, the formula is correct. Platforms make sense when the benefits of reducing the horizontal context are more than offset by the overhead of adding a vertical to the organization. Because the platform team will be a vertical, you just turn the vertical sideways, and you go like, the product is the product, and the customers are ourselves. If you just smash the vertical this way, just ram it in there really hard, it makes sense. You get to reuse all of your like sales team and your knowledge and your go-to market and all these visualization properties. All you have to do is figure out how beneficial emergent innovation and context reduction is, and how costly overhead and collaboration is. Which is really easy to just plug into Google Sheets and make a spreadsheet, and magic. You have to have platforms be more than the sum of their parts, because you can't actually stick human dynamics in a spreadsheet, as much as we like to think you can. If they're not more than the sum of their parts, you'll never be able to grow them because they're always going to end up being more costly than they seem.

How to Build High-Impact Platforms

When you think about now that I have the platform, now that I have everything, how do I actually build it? Here's the blueprint for building a platform. It's beautiful. It's awesome. Here's the blueprint for not building a platform. It turns out, they're actually pretty much the same formula. When you build a platform, step one is you listen. You listen to understand, not to be understood. You observe, which is another aspect of listening. You're actively listening. You're investigating. You're really listening. Then you validate and you repeat back to them like, ok, this is what I heard, is this correct? You're really trying to understand them. Then you have the empathy. It's learning, it's listening, and it's understanding. The whole thing is empathy. The evil strategy starts off innocently enough. You're like, I'm going to examine. I'm going to come in, and I see a problem, I'm going to investigate. The problem is that you're going to come in with a preconceived set of notions and biases, and you are examining to validate and vindicate your preconceived idea. Then when you go into there, you're like, awesome, the data matches the numbers that I carefully selected in order to answer the question that I carefully posed. Brilliant, I'm going to build it. Then I'm going to just launch it into the sun. Why is nobody using it? It turns out that at no point in this process, did you actually try and understand people. That's the secret, you need to empathize with them. You need empathy. If you don't have it, you're never going to actually build a platform correctly.

When you think about empathy, there's four different characteristics of empathy that are all important and crucial, you cannot miss any of them. Step one, you need to recognize that people are telling the truth when they talk about their perspective. It may not be your truth, that doesn't matter. That's how the reality is forming. You need to actually understand that. You need to suspend judgment. You need to stay curious, no preconceived notions. You need to recognize and acknowledge other people's emotions. They have them, they're valid, deal with it. Communicating your understanding of their emotions is the last step. Actually, that's where that connection happens. If you notice back here, I didn't talk about doing anything. That's intentional. Empathy itself is the action. You may think, there's no point in which I just go do stuff. That's fine. The platform will be built, don't worry. If you take this step-by-step recipe, and you go, feelings are hard, how do I just go talk to people? It's easy, but also not easy. If you look for these clusters, and you start going like, where do I start to find people and ask them where things are going? The first step is going to be, ok, where are your pain points? Then you go and you try and find the pain points. Because people say, the pain point's over here. Then you go and try and find them. Then you compare the results. Then you keep going until you are capable of understanding and identifying in your system the points in which people say that they hurt. If you can't build that system visibility of something, the system is systemically broken and it's not going to be able to be fixed by papering a launch service, products on top of it.

For ICs, what you might want to do is you want to ask individuals, maybe on your team, like what their pain points are. You're going to be looking for the pain points, typically, in the duplication of effort, or in the code base, or in the process, or in the communication. One very fascinating way to do this is to actually run a copy and paste scanner on all of your code bases, and figure out where people are literally duplicating the code. That is where libraries come from. Then if you take those and you see, they keep modifying something, and I can't build a library, that's where the platforms come from. Then an engineering manager, you might go, ok, I have teams to interface with me, what are their pain points? The duplications you're looking for are the workflows. The interface is the process and the planning. The interfaces, the workflows, those are the things that you can't necessarily solve with code. They'll go, this is just like how things are, this is the way things are. The magic of platforms is that you can start to address this with that. You can start to address it with that platform engineering, but you need to know where it is, and you need to know where to look. For executives, it's tricky, but you have to ask the organization leaders what their pain points are. What you're looking for is the duplication of the language, the drifting of the vision, like the alignment effort. If people keep building programs to do things. If you keep noticing that engineering leaders spend all their time writing one-pagers on like, how we do testing, and everyone has written it five times, there's a problem there. That's what you're looking for. That's what you're looking for as a leader to go, we need some type of thing that lets us actually coherently solve this together, and share that, and communicate that in a way that actually doesn't duplicate everything.

You have that, and it's great. It's awesome. Then we go, ok, so how do we not talk past each other? How do like I, as an IC, understand my manager and communicate to them in a way that's valuable? How do managers do it to executives? How does everyone do it for everything else? The motivation of a platform, you can't sell things, you can't build it, you can't do anything, they won't come and use it unless you have these motivations. For ICs, like I said, it's thriving. If you tell an IC, "You should use this platform, it'll make your life suck. Isn't that awesome?" They will not. They won't be interested. If you tell them, it will make your life better, they'll be interested. Maybe they won't use it, but they'll be interested. That's what you want. A manager, if you tell them, this will actually make it harder for you to meet your deliverables. Why would they use it? Have you met any company or any organization that successfully implemented higher quality code by just saying, managers, I know all your products, like people would just give you a roadmap that has 110% of your engineering time allocated to it. If you just took 30% of your non-existent time and put it on quality, no. For executives, alignment is like the key thing that drives everything together and ties it all together. If you can't build an organization which people can independently go and do things, they're not going to be able to actually innovate. That innovation is one of the whole things to actually make software development work at scale. You can't build the ability to be innovative, they're going to get stuck in a rut, and then they're going to get taken over by someone else who's able to do that.

The priorities of platform engineering tend to be, reduction of toil. The optimization of team interactions and the long-term alignment of innovation. When you talk to ICs, and the things that they focus on first in a platform is the toil. It's the thing that like, they know how to do it, they hate doing it, they have to do it every week, or just, it's treachery, it's annoying, it takes 10 extra steps and it doesn't need to. That's going to always be their highest priority, even if it's not necessarily the highest priority of everyone else. What you need to do if you're going to build a platform is figure out how to get the priorities of everyone aligned. They don't need to have consensus, but you do need that alignment. For the team interactions, it's the way to get the toil reduced, and the interactions of teams aligned. What in the toil and what in the interactions overlap? That's a good question to ask. How does that affect the ability of an organization to stay aligned when they're innovating? Can you try and answer this in a way that aligns those priorities of everyone? If you do, you have the first priority of a platform.


A platform is an abstraction, that platforms like a platform. You will know it when you see it. Also, one of the beautiful things about platforms is that they are capable of so much. They may not always take the shape that you think they will. They don't have to be a dashboard. They don't have to be like something. They don't have to be this magical CI/CD service that makes your coffee for you. Platforms can be anything. When you build a successful platform, it's going to platform like a platform. Platform teams are organizationally empowered to address cross-cutting changes. They must be. If they are not, if you have decided to build an engineering organization that is unwilling to have a team that should be able to address cross-cutting concerns, as a leader, you failed. You should disband the team and figure out something else, because you will otherwise just burn the team out, break the trust of everyone else in the organization. It will make every further attempt to actually fix the underlying problem harder in the future, you have to get this right. The combination of the technical, of the procedural, of the cultural improvements, all those have to be something that they can actually be empowered to do. They should be able to tell you off as an executive. They should be able to suggest things. They should be able to actually go through and say, here's what needs to happen. Platform engineering, because of this, it can turn culture into an intuition around how to use the platform. That intuition thing is really powerful, because when you build like a language, when you build a platform that's self-consistent, you actually get the ability for people to start developing this gut instinct and this hunch. This gut instinct allows you to intuitively just go, that feels right. I'm just going to go over there and understand this. That's where that innovation happens. That's where people can actually natively use the platform, and then build on top of it. Otherwise, you're just building to the platform, and you're never reaching beyond the limits that are set by it. This intuition is awesome. You want to make change easy to handle, not easy to do. This is something that I say a lot, it's something that's really true. If you have a platform, it's not about making change easy to do. It's not about how fast can we get stuff out the door, it's, can we make the change easy to handle, and embrace the fact that change does happen? Can we build something that is so understandable, so malleable, so integratable that we actually get to a point in which if something unexpected happens, we don't necessarily anticipate it or expect it, but we can handle it, we can adapt to it, we can understand it, we can learn from it.

When you build a platform, if you mess it up, here are some of the ways that you may mess it up. DevOps tends to fail, because the sharp and the blunt ends of the system are rarely aligned. When you have these things that are rarely aligned, you need to be able to actually fix it. Addressing that requires a cultural shift. This is why people say that DevOps is like a culture, it's a mindset. It requires the whole transformation, because when you have these opposite ends of everything you need to actually be able to fix it. Fixing it is going to require some very complicated social engineering. SREs tend to fail when they're implemented prematurely. It really only makes sense when that horizontal context is really large. If they're not large enough, what you're really doing is you're hiring people called SREs, and they're really just your infrastructure team, or they're really just your Terraform people, or they're really just like your ops people. They're not actually SRE. What are you really doing when you hire certain things?

Platform engineering tends to fail when it is implemented without empathy and diversity. The diversity is required, because part of that empathy is fundamentally being able to understand the other person and be with them and have been with them. Can you get down to there and say like, "I know your problem. I understand your pain. I have been there." Can you do that if all you have is a team of people from an extremely homogenous background, that have no understanding of anything outside of their very small scope? You literally can't. They can try really hard, they can get there, they can understand things, they can read books, they can get close to faking it. You actually need this diversity, you actually need the empathy, it has to happen. If you don't have that, you will not build a platform. It tends to also fail when agglomeration-based approaches aren't used to address the optimizations. If you look at something and you don't look at that cluster, and if you don't look at that, here's where the efficiencies could happen. Here's where I can change things around to take advantage of this emergent clustering of information, of people, of processes, of teams, then you're going to end up just going, I want a platform. I'm going to think up, here's where this platform is, and I'm just going to guess. You will probably be wrong. Platforms are a way to scale doing, and they are very good at it. They're not the only way but they are an excellent way to do it. When you think of what platform engineering really is, at the end of the day, it's empathy-driven development, it's empathy-driven collaboration. It is empathy-driven organizational design. Platform engineering is empathy-driven learning.

Questions and Answers

Bryant: During the new phases of a platform when you're just rolling things out, maybe is a year and going, how do you balance that tough sledding moment? How do you push through and show value that the platform is actually going to deliver value, ultimately, when people may be pushing back and getting a bit nervous about all the change.

Weakly: One of the ways that you can really do that successfully is you want to find that motivation for people and that agenda behind the agenda. If you can meet that and make that happen at the different levels, then you're going to end up pushing the platform through, and you're going to end up actually starting to make that change. One thing that may end up happening is you may have to have a combination of like halfway bribery, halfway like I'm doing some work for people, halfway like getting the conversation started. Using that as a way to further the conversion process. Making things opt in, dealing with that psychological safety, dealing with what are the underlying concerns people have, that tends to unstick things. If we're then just stuck with more of that leadership mindset or more of that management mindset, a lot of it comes down to risk management. People will bet on basically anything if the risk and the reward is at the right proportion. If people aren't betting on it, if people aren't like committing to it and letting it happen, that is not at the balance where they want it to be. If it feels like it should be, there's a miscommunication actually happening there. Figuring out what that is, so that you can leverage things and reduce the risk, and then increase the reward to a certain point where it's past the threshold, will get that commitment through.

Bryant: The triangle, where you show the cognitive load, and then those things, is it a tradeoff in the triangle, as you move around, do you have to balance those three things off.

Weakly: It's not necessarily a tradeoff triangle, it's three different aspects. The tradeoff is implicitly always time and effort and allocation of company resources. You're going to always trade off like how much time you invest in things. It's not a tradeoff of, we can only have one or two or something like that. It's, you have enough time and you have a finite amount of time and a finite amount of resources as you're going through, you need to pick one first. You need to balance through that and go through and actually take that diagram and modify a little bit from the DevEx paper by Nicole Forsgren and Abi Noda. It's a great paper. It's a great set of research. I took it and combined it and made it more specific to platform engineering. It's not tradeoff in terms of, you can only pick one or two. It's a tradeoff in terms of how much time and resources do you actually have.

Bryant: Any tips for small actions, techniques for how you would find out what people are doing and drive for the actions like sharing product roadmaps.

Weakly: That one is tricky, because it really comes down to the communication profile and the design of the actual organization itself. Organizations tend to be either very Slack heavy or very email heavy or very like a certain other way of communicating. Whether or not they stick information in a certain place or don't, or even do the thing, all tends to be different. Sometimes you just have to do some investigative sleuthing of like, going to all the different teams. Surveys are a great way to get the information and initially gather things. Another way can be like, just when you learn things, put them in tables and organize it and just keep track of everything. Then if you build a useful enough resource, people might start using it. Fundamentally, unless you have the leadership and alignment on, let's actually consistently standardize on communication methods, you may have to meet people where they're at and figure things out there and build a useful enough resource that the collection of information on what teams are doing becomes valuable enough that people start contributing to it, but then that can be its own platform.


See more presentations with transcripts


Recorded at:

Mar 20, 2024