BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Available, Affordable, Attractive: Enabling Platform Adoption

Available, Affordable, Attractive: Enabling Platform Adoption

Bookmarks
29:53

Summary

Olga Sermon showcases the tools developed for their internal platform, the user communication channels and templates, feedback collection forms and user journey template.

Bio

Olga Sermon is a Senior Engineering Manager looking after the platform teams at SuperAwesome. Over the last three years her organisation has grown 6x times and they are now focused on providing platform tools as a service.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Sermon: My name is Olga Sermon. I look after the platform teams at Superawesome, that's infrastructure and data platforms. I'll be talking about our journey from DevOps consultancy, to building platform as a product. I'll explain what it means to us to be a platform product team, how we work with our users, and how we make sure that what we build is actually interesting to them and gets adopted. First, a few words about Superawesome. We are a kidtech company. Our mission is to build a safer internet for the next generation. In practice, this means that we're building tools for creators and developers to enable them to engage children and young people online, safely and securely, and with respect to their privacy. In 2020, we were acquired by Epic Games, and our products are now built into enormously popular games such as Fortnite, and some of them such as consent management are available for free for all developers from Epic online services.

The Days of the Heroes

In the beginning, our team was really tiny, first two, then three and four people. Our function was to provide DevOps services to other teams. Those were the days where our team was on call every night, and people were actually called most nights, in fact, multiple times every night because the budget was so tight, we used to do really silly things like run databases on Kubernetes. We were very close with our customers. There weren't that many of them, just three other teams. We used to attend every standup every day. We used to attend most design discussions, and draw them a lot of the time. We were there for every big launch, setting up things like deployments, and observability for all new services. In other words, we used to act as a DevOps consultancy. Some people genuinely enjoyed those times. They had an opportunity to act as heroes but it wasn't a sustainable way to work, and we lost some very good people in those days. Most importantly, this approach just didn't scale. As the company grew, it was no longer possible to be so close to all our customers.

Cognitive Load

In 2019, a book came out called, "Team Topologies." The book has revolutionized the way we think about organization design, and DevOps teams in general. It stated that we can significantly increase the speed of delivery if we optimize for two important parameters, cognitive load, and team autonomy. Let's look a little bit closer into the concept of cognitive load. There are three types of it. First, is intrinsic cognitive load. That's the skills necessary to develop the application. If we're developing over web, intrinsic skills would be finding the library to manage connections or to establish secure connections. Then there is extraneous cognitive load. That's the mechanics of development. For example, for a web app, it would be skills necessary to deploy it to production. Finally, there is Germaine cognitive load, that's domain knowledge. For our web app, it would be the purpose of the app. If we're building something in app tech, that will be app technology.

Types of Teams

Team Topologies introduces four different types of teams. It all starts with stream-aligned teams, also referred to as product teams. Their purpose is to deliver value to the customer. They are what comprises majority of the teams in modern companies. Then there is enabling teams. These are teams or individuals, they have specific knowledge. Their purpose is knowledge sharing. For example, an agile coach in most companies will join different teams and help them set up their agile development process. There is complicated subsystem teams. These deal with exactly that, complicated subsystems, such as authentication system, or sometimes a data warehouse. Their job is to deliver the system as a service to the product teams. Then, finally, there is the platform teams, us. Their job is to create tools to enable stream-aligned teams to be autonomous by reducing extraneous cognitive load. This means that things like deployment and observability stops the product teams feeling like this, and start feeling like this. At least, that's the dream. That's what we're aiming for.

What Is Building Platform as a Product?

SA Platform was born. Our mission was to provide training and tooling to enable Superawesome engineers of all experience levels to provision and run infrastructure independently, quickly, and easily, in a consistent, scalable, and secure manner. In other words, we have set off to build a platform as a product. What does it really mean to us? What is it to build infrastructure platform as a product? There are a few key things about product that we need to bear in mind. First of all, a product is self-service. This means that our users should be able to use the product independently without us holding their hands, telling them what to do, or making any decisions for them. It doesn't mean that we can't interact with our users at all. It does mean that the nature of the interaction changes from knowledge sharing, to learning. When we just discover a problem, and we find the best approach for it, we will interact with the users quite closely, often shadowing and doing what we need to learn about their needs. Then we will run a discovery and deliver our small solution to them to see if we understood what they needed correctly, and if it would work for them if they would be able to use it. Once we have established the right solution, we will move away and deliver the solution as a service to enable them to work independently.

A product is something that's flexible. It's something that evolves and takes advantage of new technologies and user feedback. We now have a roadmap. It isn't just running from fire to fire anymore. This roadmap is actually user driven, where we present our ideas to the users and collect their feedback about them continuously. Our product is optional. It means our users can choose to use it or they may choose to go to market vendors. This means that our product needs to bring clear value, and it should be fun and easy to use. To help us with that, we collect continuous feedback about our tools and services, about what it's really like to be able to use them. Finally, a product is something that's measurable. It has a clear set of features with clear boundaries, and it has measured outcomes. Team Topologies actually goes as far as defines a perfect set of platform metrics. First of all, we have to measure the product metrics themselves. Do our tools do what they're meant to, and how well they're doing it. Then we collect user satisfaction scores, or NPS score. We collect reliability metrics. These are very important. It's imperative that the platform is stable. The stability of the platform is what empowers our developers to experiment, and create software safely. Finally, most importantly, we collect adoption statistics so we can understand if we have been successful. If what we have delivered actually brought value, or if we need to go back and iterate on that.

How Do We Enable Adoption?

How do we go about it? How do we enable our platform adoption? Just like with any product, if we want something to be adopted, if we want something to be purchased by our users, we want it to be attractive, so we want to make sure it actually brings value to them. We want it to be affordable in terms of cognitive load. In other words, we want to reduce the price of adoption as much as possible. We want it to be accessible. In other words, we want to make sure that it covers enough use cases to be interesting to as large user base as possible. I'm going to cover what we have tried so far, what we're hoping to try. I'll explain what worked and what didn't work as well, for us.

How Can We Make the Platform Attractive?

Our first approach to our platform adoption was really quite simple. We thought that we could just build it and tell the customers to use it. After all, we have been working with them very closely for a very long time, we knew exactly what their needs were, so why wouldn't that work? This is how we went about building something called the Matrix. We can see a screenshot of that on the slide. The Matrix project was approached in the spirit of the typical developer dream. When the developer goes away, and then builds something absolutely astonishing, and everybody is so impressed, they fall in love with their creation from the moment they lay their eyes on it. The matrix was meant to be a service discovery tool. It actually was a really good idea. It would bring a lot of values to our new joiners learning about our systems. It would bring a lot of value to teams like InfoSec, because it recorded all the technologies that our services were using, and it would have been easy for them to identify if something was compromised by a security vulnerability. It was also very difficult to use because all these details were meant to be filled in manually and unabated on regular cadence. We suggested three to six months. We spent a very long time building that, months actually. Did a lot of research about our services and about the needs of various compliance and legal departments. When we presented it to our users, they didn't quite fall in love with it. In fact, they hated it so much, they complained about it. None of them bothered to field it.

Our next approach was a bit more user driven. We figured the easiest way to build something that our users want is to just ask them what it is they desire. We issued a user survey. We asked our users to propose a solution to explain what problem it would solve, and to invite them to comment on each other's solutions. That has produced a lot of results. A lot of people went to the trouble of recording their suggestions for our team, but it didn't quite work as well as we hoped because most of the suggested solutions were just specific to the needs of particular individuals and their teams. While it would definitely be something that our users desired, only a very small percentage of the users desired each particular solution. We knew we had to try something else, and the next thing we tried was looking at it from a different perspective. We looked at it from the problem perspective. Instead of asking our users to tell us which problems we need to solve, we used our knowledge of the system to propose which problem we wanted to solve to them. We would explain the problem statement, the goals of the solution, and the use cases. We would list some possible solution candidates. Then we would invite our users to comment on that. On one hand, we picked a problem that was relevant to a considerable portion of our user base. On the other hand, we were developing the solutions in collaboration with our users, immediately addressing the value risk before any engineering effort was invested in the solution. Out of all the things we tried, this approach worked the best and we have been using it now for several years.

How Can We Make the Platform Easy to Adopt and Use?

How can we make our platform easy to adopt and use? Of course, we have to reduce the price of adoption. We have to reduce the cognitive load of learning how to use it. There are a number of tools that can help us. First of all, we have to consider the usability risk as early as possible. It's very difficult to make a feature more usable once it's already in production. You really have to consider usability right at the beginning of the design stage. You have to build it into your POC, the proof of concept. You want to automate as much as you can. Zero effort adoption is the easiest adoption possible. Finally, there is one other method of adoption, that's adoption by default. We try to avoid that for things that users can see, because we want them to maintain their autonomy. For things that are happening behind the scenes, it's just perfect. They don't need to worry about them at all. They just quietly, magically happen. For example, if we're talking about an autoscaler, that helps them to find their instances quicker, and cheaper, and in line with their reliability requirements. In order to address the risk of usability, we use journey maps, just starting to use them really. It's when we write down everything that these are all the user actions, and we consider their interaction with the system from several different perspectives. For every step of the way, we think, who owns it? What the user is doing during those steps. What are their needs and pains? What do they want to achieve in that step? What do they want to avoid? We think about touchpoints, that is the part of the system the users interacts with. We think about their feelings. Based on all that, we think about opportunities, things we can do better during that stage.

Another important tool that really is key to adoption is documentation. If the users can learn how to use it, they will be more likely to adopt it. Of course, documentation needs to be self-service, meaning it should be easily discoverable and our users should be able to understand it on their own without any help from our engineers. We also have to live with something called example application, a quick start service. It's literally a button that they could press and get the service running in production in about 15 minutes. The service, of course, is very simple. It's just a Hello World web page app. It does have things like deployment pipelines built in. It does have quality scanning. It has observability. All the nice little bits that a service should really have. One other thing we thought about but we haven't tried properly yet, is a CLI interface. It's a nice, easy-to-use tool for engineers. Something that they are already used to. Potentially, it could provide a single interface to all our tools and documentation.

I would like to say a few more words about documentation, because it is so important. We treat it as a first-class deliverable. This means we're segregating staging and production. Documentation is delivered to our internal space first, where people can review it and make suggestions. Once we are all happy with it, then it can be promoted to a place where the users can find it. We audit it regularly because it does go out of date ridiculously quickly. We try to make it as simple as possible. Most of our user facing documentation doesn't assume any prior knowledge of the platform. We stick to consistent formats. Every section will have an overview. It will have some how-to guides. It will have references, for example, common error codes, and troubleshooting. Finally, there is a little form at the bottom of every page where we are asking users for feedback.

When everything else fails, the users still have to be able to find a way to use our products. We want them to be able to come to us for support when they need it. We define clear support interfaces with a clear remit of responsibilities. There is a page on our documentation space, which we maintain religiously, which defines which parts of the infrastructure we provide, and where we are relying on the product teams themselves. Finally, there are several support channels for different types of requests, and they have different support workflows, from very urgent where the users can get help within 15 minutes, to medium where most of the questions are resolved under an hour, to longer term requests, things like requests for new features and consultant requests. These might have a timespan of several days, up to weeks depending on the request really.

How Can We Make the Platform Accessible?

The last question I want to talk about is, what we do to make our platform accessible. What we do to make sure that it covers enough use cases to be useful to most of our users? This is actually a very difficult balance to strike. For example, several years ago when people first approached us about serverless, we said no to them, because it was too exotic, and it was not really used in production. However, when people approached us again this year, we agreed to extend the platform to serverless architecture because it became more viable and more teams were interested. What do we do? What do we use to find whether something is worth adding to the platform? There are a number of tools we use to work with our users. First is a user survey. It's an anonymous survey, where we ask them about our tools, how easy or difficult it is to use them, and if there is anything missing, if there is anything else they would like us to do with the platform. We run interactive sessions with our users, shadowing them. just doing their day-to-day work. Trying to learn what it is they're dealing with and what problems they're facing, even if they don't notice these problems themselves, because they're so used to doing them. For example, I learned loads recently, when I shadowed somebody creating a new service. We have a user steering group. These are quite interesting, and I would like to go into a bit more detail there.

First thing when creating a steering group is to find a representative sample of our users. We try to make sure that there is both senior and junior engineers. We try to make sure that we have both people who have been with the company for a while, and new joiners. We try to make sure that there are people from different business units, just because people do things slightly differently, different teams. If the user base is very diverse, it's probably worthwhile running separate user steering groups, just so you can address their needs more effectively. Because every meeting with a user group needs to have clear objectives. We want to tackle a specific use case, or a specific problem, for example, how do you deploy a new service? Of course, these are feedback sessions. It's actually non-trivial sometimes to listen to their feedback. When people talk about their day-to-day work, they tend to talk about it in absolute terms, providing unconstructive positive and negative feedback. Our job is to dig into that, is to ask, can you please help me understand, what makes it difficult to deploy a new service for you? Or, what do you find most exciting about it? Then we follow up and we make sure to find the success stories and to tell them, because our user groups are also our champions. There is an interesting pattern in running user groups because of it. Generally, we have a few sessions where we try to understand the problem we're trying to tackle. Then we go away and we find a solution. We engage with the users again, to understand that the solution is working for them, and then eventually deploy it to production. Then there will be a break until we find another tricky problem to solve, and we need to get our user group together again.

Conclusion

Building platforms is a very interesting and a very challenging enterprise. We've tried a lot of things over the years. Some of them worked well. Some of them didn't work as well. That's what we're going to do in future as well: keep trying new things, keep finding what works best.

 

See more presentations with transcripts

 

Recorded at:

Oct 20, 2023

BT