BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering

From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering

51:40

Summary

Sergiu Petean discusses the strategic journey of evolving DevOps into platform engineering within heavily regulated enterprise environments. He explains how to maximize efficiency using dynamic reference architectures, align platform KPIs directly with board-level business goals, reduce cognitive load via custom team topologies, and maintain innovation sovereignty through open-source technology.

Bio

Sergiu Petean is an influential leader with a proven track record in leading multiple digital transformations and scaling large organizations within complex regulatory environments. Over 15 years in Financial Services and Insurance, he mastered the challenge of driving change at the critical intersection of business, compliance, and technology.

About the conference

InfoQ Dev Summit Munich software development conference focuses on the critical software challenges senior dev teams face today. Gain valuable real-world technical insights from 20+ senior software developers, connect with speakers and peers, and enjoy social events.

Transcript

Sergiu Petean: I decided that I should be a little bit more focused on the audience and also have a certain analogy between where we are today, where we should focus, and where we go next from here. I found this one quite interesting. I generated this image to basically understand where we need to focus today. I feel like there's an opportunity cost happening everywhere, in the software engineering room but also in the board, where everybody seems to be more or less focusing on flying cars, which is an AI driven innovation, if we do the analogy, and actually we should most likely focus on something else where the foundation is being built because this is not a revolution, we saw that coming.

If you look at autonomous cars, you saw that there are years of actually building on something and then gradually going from one innovation to another innovation, but it's a hard work behind the successes we are seeing today. Unfortunately, I hope you are not dreaming and focusing too much on the flying cars. That's a nice feel. It's sexy. Everyone wants to be there. Make that your hobby and build something strong as a profession, and then you're going to enjoy both worlds.

I hope the analogy is well understood because as you see in the first two slides, I believe we need to, most of us, and please disagree with me, we are still spending time in the legacy infrastructure where most of our systems are actually working and where most of the money is being printed for our organizations. Probably are now moving already, ideally, move towards cloud-native transformations, and some of us, the luckiest one, probably are already working on the platforms. By platform I mean AI platform, cloud-native platform. Everybody has some kind of platform behind the business and behind the work because that's where efficiency comes from.

The current economical cycle is all about efficiency. I believe today we are trading productivity for cost reductions, and the impact of platforms has to be much better understood from the financial point of view but also from a strategic point of view where the future has to be written in digital space through innovation.

The Evolutionary Journey of a DevOps Team Towards Platform Eng

I'm going to focus not on the flying cars, I'm going to focus on the third pillar, on building a platform. I'm going to be a little bit focused on the enterprise version of that, and to be even more precise, on the heavily regulated environment in Europe. I was leading the cloud-native foundation technical advisory board working group on reference architecture. I know that's a lot. I was basically working with the end users from CNCF to discover successful ways of using cloud-native.

Initially, we started with the assumption that we have to build one single view, one single reference architecture and we assume it's going to be statical so we debated which vendor should not be there, and how many enemies will we make for every single vendor that will make it in the reference architecture. There was a strong debate here with some traumas from the past of course. Then we realized that you can't have that reflection in one single statical image. What we needed to do was to basically create clusters, work with different parameters and look, for example, at the region that your business is working, or look at the industry that you're coming from, look at the talent that your company, your enterprise has access to. Then all the other forces which are shaping your platform like compliance or regulation, so that's really important.

Then we decided to build and find and discover a reference architecture, a successful one for each single industry, each single cluster. We also decided to not make it statical because if you have something static, that doesn't really help you much. You understand how someone got somewhere but you have no idea where he's coming from, what he had to face, what he calls legacy. Also, you don't know where he's going to go from here, so that's more like a realization and actually helps in shaping your next journey.

In my own world I took the same priority and the same focus and I defined for me a certain definition for platform engineering, and I don't see it as an IDP. Max did a good introduction to everything that platform engineering needs to be. I will not repeat the message. I fully subscribe to everything that Max said. Then I'm taking all those learnings and I'm focusing on my journey again on the financial sector in Europe where you have access to certain talent, where some of the most powerful forces are regulations and compliance and risk mitigation.

I had my own definition of platform engineering and I believe it has to do with experience more than anything else. It's not about tooling. Technology is important, of course, to realize that it's important. Then, what kind of technology you use is going to be shaped by again the forces and the stakeholders that you have to deal with. I also decided that any platform I'm building has to be aware of all the stakeholders.

Now the preconception normally about platforms is that it's one-to-one with an IDP. An IDP is focused on developer experience. In my case, the strongest stakeholders were not the developers. We had way bigger forces than developers. I'm mentioning here of course security. I'm mentioning compliance. I'm mentioning the CEO office and Chief Transformation Office. All of them became my and our stakeholders. I took a great deal of understanding the needs and understand where they get value from my working, and reflect that in a framework where I can have a conversation on KPIs with them. More about this at the end.

Now I want to go next and really focus on the journey itself. We realized in the CNCF that you cannot have a statical reference architecture. It doesn't really help you. We have to tell stories. We all know how important narratives are. We as a species we like telling a story, and building a platform is nothing but a story. Realizing again where you are on this journey. How to tell your story. How to measure KPIs. How to motivate your people. How to understand the relation you have with the environment that you're behaving, and I'm talking about topologies.

This is a learning that maybe you can take with you. For me what was important was to understand the challenges. I'm sure every single one of you, if you're here, I'm sure you have a lot of challenges and they have to be well understood. It's not always the case. Some challenges are misguised. Then, if you address a challenge that's going to have to change you most likely. You have to be a force of change in your organization, maybe even inside your team, maybe you start with your team. You prove that change works out, and that it produces value.

Then, if you get enough credibility then you can get enough momentum and enough support to change more than just your team. That's exactly what happened in my case. I started having an impact on my own team, and I had to change the way we were working. I had to change our internal identity, then identity in front of our stakeholders, and that had already an impact on my team and then on my organization. I decided to measure all that. Those measurements also evolve depending on the phase I was in.

At the end, I captured all that in a framework which I'm now using to measure transformation, if they are successful or not. Hopefully, I'm going to push this conversation. My ambition would be to make it a standard for every cloud transformation now to really be able to carry the same conversation about what success means. Because when you look at the business success you normally have two, three KPIs. You throw on a slide and you say growth, return, premiums in insurance, or you say, happy customers, and everybody says that's a successful story. With platform engineering you have no idea what success means. You cannot compare with anyone else, so how can you have the conversation about success and success to whom if you are not able to now quantify that in a KPI for your stakeholders.

Anti-Type H: Fake SRE Technology

My journey started in 2021, February. I joined Allianz Direct, just to realize that we were living in a wrong identity. We thought we are something, we were not. We thought we are doing SRE, we were not doing SRE, we were just faking it. I'm curious if any of you feels like you're in this situation. There is hope as we're going to talk. The first step is the realization that this is an antipattern and you have to go out of it.

We did that by having some very strong conversations. It's so important to sit down and define your principles as a team and then build your internal identity. It's important to understand that you're a product team. That's the first principle you have to adopt. You're building a product and that product is driven by principles. It could be personal principles. It could be company principles ideally now. Then you have a mission, and then you manage to really write that down.

It's very important because that's going to change, that's going to be a challenge. You're going to add a new person to your team. Some others will leave. You're going to have to talk to different third parties, and this is what's going to keep you as a whole, as a team. It's going to build the culture more or less. When you do your hiring then the persons you're getting, they have to be compatible with your principles. Having them written, having them available even for HR, even mentioning them in the job description makes a huge difference.

DevOps Advocacy Team - Topology

That changed us again, we realized we are in the wrong place. We worked on our challenge and that made us change. We changed our identity. We changed the mindset, we say now starting today we are a product and therefore we become an advocacy for DevOps. We believe that this is the way DevOps should be implemented in our organization. These are the qualities that we want to have. We started defining the mission as I mentioned, and this was in 2021. There was no conversation about platform engineering, but then we decided to build a future-proof platform and enable teams to deploy, monitor, and maintain their own production environments. Or to put it in a different way, we wanted to apply the old principle of DevOps, you build it, you run it, but not alone. You do that through us, with our help.

Next for us, now we knew that we want to build something. We want to have a platform. We knew what kind of qualities we want to get to the principles. We looked for a companion. It was very difficult to build without having a companion or catalog of services that you can build of course on top of the hyperscalers. We didn't want to become consumers either. This is a concept I'm talking about a lot. I call it innovation sovereignty. This has to do with a simple choice of consuming or creating. From the beginning we said, we want to create.

We have so many projects that we can use. We have so much wisdom already available in CNCF news foundation in open source. We just need to be smart enough to get that expertise inside our team, retain that, and move with the stream of cloud-native, which we did. We defined basically our reference architecture, again statica. This is the future version of our reference because initially we said, where are we today? Let's look at all the ecosystem, environments, dependencies, and paint the not so nice one and then paint the next one in the six months, and this is the one that we wanted to achieve in six months from now. It was 99% based on open source basically. This was the first step. It was quite courageous for an enterprise.

Speed Pleases Customers: Better Customer Reviews

Then we went measuring the impact. The first challenge we had was basically, while we were going live in a different country with the old way of working where we had Jenkins and 200 scripts written in, I don't know what languages. Basically, we changed all that and in two months we focused the whole team and we more or less changed the whole software release process. We wrote a new CI/CD based on Tekton back then. Now we have Tekton and GitHub, but we started with Tekton back then. GitHub Actions was not where it is today, so it was not so reliable.

Then, of course, because we changed everything on the go, we wanted to know if we have a positive impact. Is that really working? I was looking around the space. DORA metrics was a big thing back then. We had a report. It was the only way I found and still today I strongly stay behind the decision to actually not just compare yourself with the old yourself, but also compare yourself with the industry as a whole. Now you understand, am I doing something positive? Yes. How positive is that in the big image of the CI/CD world now? We were extremely surprised to see that we were in the elite cluster. We were doing very well. We started releasing on demand several times a day.

Of course, the competing forces are how reliable you are when you move so fast. That was the first experience with DORA. Even though initially we did it just to get confirmation, later we evolved the whole DORA metrics into a tool for identifying patterns of improvement for every engineer, so every single team. Then we compare the behaviors of different teams. Then we wanted the best teams to be an example for the other ones. Then you could zoom in on the PRs, on the feature requests and find out, for example, that in case of one team, they were failing a lot in a very late stage on pre-prod. They were releasing very slowly in production because of that lack of unit testing in the test environment. You're not using the CI/CD properly to get the feedback on the quality. This is the findings we were starting to use DORA metrics.

Then we started, of course, promoting ourselves. As a team, you need to create a brand. We went and we started bragging in front of the board and different other stakeholders that we identify. We saw they were caring, of course, about the findings. The first feedback was from the COO concerned about the cost per unit. That that's cool, but do we really have to be as good as Google, for example, when we release this? Doesn't that cost us too much? She was very interested in finding a cost per change unit, which is something I created over here. We realized that €49 per change is not a bad deal. How do you compare that? What are the others doing? Is that good or is that bad? Is that expensive? Does anyone else have that kind of unit per change or not? It was good to have. It was good to compare ourselves every single year to see improvements.

We still didn't have any idea how we are now comparing with the rest of Allianz and the rest of industry. That's something for the next slides. This is more focused on the DORA. I spoke about this one. These are the DORA metrics, the first four. Then, of course, reliability is part of DORA metrics as well. Then, this is our contribution to DORA metrics, which is the financial metric for DORA to complement it and include the different stakeholders, the CFO and the COO.

Security Integration

Next for us was to integrate security. Again, highly regulated environment. You have a lot of pressure from security and compliance to make sure security is embedded as much as you can in your software release. We did that. Natively, we included the whole stack. We were sitting on a license of Prisma that we were not really using, just reactive. We said it's about time to configure it. We did that. We ended up with more than 50,000 vulnerabilities. I don't think you can shift left with these numbers. It blocked the whole organization.

We had to get a little bit creative. I hired a very smart team, a small one. Together with my teams, we created a module, vulnerability management, we call it, that basically sits on top of these very cool tools like Prisma and Wiz, and reduce the noise by injecting your security intelligence in the whole mix. Then we got to less than 100 vulnerabilities. Then we started shifting left. Of course, the next thing for us was now we know what we can know, what we can act, but who needs to act. We had to go further. I'm going to get to that part.

Takeaways

Some takeaways from this phase now so far. I didn't mention the billing. I mentioned a little bit the innovation sovereignty. Innovation sovereignty goes through having the right talent and keeping the right talent motivated. Of course, giving the space to grow. Hiring the first technical leads and creating a culture of work ethics and a culture of quality, it's essential. They will bring their friends, they will bring their colleagues, and they will be the ones basically driving the culture in your team and then your organization. Innovation sovereignty, I mentioned it. Then, team identity and product mindset as well.

Extending CI/CD to Reflect Stakeholders

As I mentioned, now we had security embedded. We reduced the noise and the friction. We had actionable insights. How do you do that now? We started a new challenge. It was like, how do we extend the CI/CD to reflect the other stakeholders? We brainstormed. We went on the security. We went on the incident management. We talked about delivery and users. Then we identified different KPIs for them and said, so these are the things that have to be naturally included in the platform that we are owning. We need to create a process, a process capable of identifying the owners, and then giving them actionable insights about their concerns.

In the case of security, what we did was that we build a catalog of services. We embed it through automation in Opsgenie, we're using for centralization of incidents and alerts. We reduce all the alerts because we had all the alerts hitting only one small team. Then we said, this has to go to the whole organization. Now we want to reduce anything that doesn't bring insights. If an alert doesn't have a runbook, it doesn't have an actionable insight. We force everyone to have runbooks behind this. Then we also had to enhance the security tool. Every single team was getting up to 10 vulnerabilities that had to be fixed in the next 28 days, basically. That's the process. This helped us really create an SRE process that is capable of understanding the full governance of our IT assets. That was essential for us.

Then, on top of that process, we started having automation conversations now and feedback loops to the other stakeholders, basically. For example, continuous compliance, we made the first steps into accomplishing also that.

Federated SRE Topology (SRE as a Service)

Again, we had to change, we became something else. We started addressing different problems. The scope was bigger for us. Obviously, now we had to change the topology. We created a group of SRE, I call it OrgSRE now, or OrgOps, because you're not targeting just the engineers. These are not just SREs. These are a group of platform engineers that were really focusing on SRE. They defined the process. They defined the tools. Then they had to have a certain exchange with all the stakeholders that were consuming the new services. They became more like a center of excellence, and they started educating basically the rest of the organization in how they can automate their needs into the process, and how we can make the whole feedback loop work for them.

We also started to federate, because while this was happening, Allianz Direct was scaling from 150 employees. We were back then 1,000 employees with more than 300 engineers. Today we are around 2,000 engineers, so the scale keeps going. You can't have your DevOps team scaling with the same rhythm, you don't have the financial for that now, as the rest of the organization. We started federating the knowledge. Initially, we decided to have a platform which is self-service, so that was really helping us. It was part of our key principles. We created this role, and I negotiated with business that 20% of time from every single squad to be dedicated to operations. We said first incidents, we said security, and then we said SRE. Twenty percent of the time has to be spent here. Then we created a community of SRE, and so the teams started to become independent.

Next was, again, measuring the impact. We did some significant changes. It was quite an effort from our end. We spent probably four months changing the whole stack, creating new processes, just to get nowhere from the results point of view. The adoption was not coming. We had to spend the next eight months actually to convince everyone about the qualities that such a process is bringing forward.

There were few things that helped us a lot. Building the community helped us. The negotiation with the business for 20% really made a huge difference. What changed really the whole game was when we created a new role in the whole organization, we call it TTL, so Technical Tribe Lead. This is a technical person sitting next to the business decision maker in a tribe. Imagine a tribe, a claim. Now claim insurance, it's a tribe, a business tribe. These guys had the authority to go to engineers and say, this is important. Having SLOs, looking at the business needs like APIs that are part of the SLAs, this is important for business. Business would allocate more time to actually now be invested here. We made it mandatory for every single squad to work this way.

Reverse Conway's Maneuver

Next for us was to really survive. The company was scaling. We federated. That was cool. That helped us. We had a big problem, which was the cognitive load, because now our stack, as you see here, it's huge. We had only two persons, the technical leads that were part of the project from the beginning, capable of covering 90-plus percent of the stack. When you have a product, you need a holistic view over your product. How do you get the other engineers to have the same holistic view? It's either you have patience for one year for someone to go through all the stack and learn it, or you can do something else.

In my case, I was looking at the reverse Conway maneuver. I reflected now the reference architecture, the technology. I reflected in the way the teams are working. I created a concept of DDO, Distributed DevOps, where you have a temporarily working group, temporarily means few months, where you have a certain amount of engineers having different roles. You can be a student, you can be an executor, you do most of the work, or you can be a consultant setting up direction. Then I made it mandatory with targets that every engineer has to move through these DDOs.

Then you have contribution from day one. If I hire a new engineer, I know that I'm going to hire him because he has an impact on a DDO. He could be an expert in API. He's going to contribute there. The value is generated immediately. I'm not overloading him with the need to cover everything. I give him time to actually grow in every single DDO that makes sense. Every year, you're going to cover three DDOs, maybe four DDOs if you're really bright. In some iterations, in two years, we managed to create two or three more. Technically, it's capable of handling more than 85% of the stack, and therefore capable of making holistic decision on the product.

We needed more decision making into that. This is more or less the way we structure the DDOs. We have a big focus on observability, SRE, CI/CD, of course. Then the platform grew, now we added IAM. Operations was a big thing. Then, of course, storage and Kubernetes plus AWS. Just a few to name here.

Shielded DevOps Topology, Platform Engineering - OrgOps

This is the way we basically had to regroup. Again, we had the pressure on not scaling with the organization, and we realized we're not spending the time where we should be spending the time. We had a lot of noise coming from incidents. We were spending so much time in incidents, that we decided to create a new role that's called production manager. Then, of course, we had a lot of compliance work, invisible. I want to make that visible, and then I created a compliance officer as well. We also said that the team needs to spend time on the backlog. We are a product team, so backlog, we need to spend more than 80% on our backlog. How to go from reference state A in reference state B, that was our backlog.

We isolated the conversation with the rest of the group. We call this role, ninja. One person from the team was available for conversations with external parties. All the others were focusing on the backlog. I created also a new role or a new function called tech desk. This is the first level, second level support on customer support. They were tasked with creating automation, creating documentation, self-service, and filtering the noise towards different teams. That was the main job that they had to focus on.

I measure, again, the impact. Because we were very successful in having the conversation with different stakeholders like business and COO, then the next time when the organization decided to create OKRs, they came to us and said on the technical side, I would like your KPIs to become the new OKRs. We did that. We took the advantage and we set up the tone for business. This is very important. We became equals with business, which does not really happen in Germany so often, where IT, it's mostly a cost center. We were capable of writing demands to business. That was really important because then we made sure that the 20% is really 20% and negotiated in the QBRs. They became basically mandatory for everyone. Availability was added, of course, and also some SLOs, especially on APIs, which, again, hit the business.

Takeaways

Some takeaways. I would focus on the tech stack. Always know where you are, very important. Also, you should know where you go. Reference architectures, at least two versions can help you with that. Six months, I found it as a good difference for the two. The delta between the two, most likely is going to be your backlog. This is where you, as a product team, have to focus. Then, cognitive load, it's huge. Complexity, it's only going to grow because you're going to take more and more. You're going to take IAM. You're going to take more compliance, more security.

It's actually not a silo because you rotate. It's like temporary silos, that's very important because it gives you the simplification for handling and growing as an engineer in the field where you feel you're lacking the expertise. Then I would say, federated DevOps and creating functions that take away all the unnecessary work. Really take the best of your engineers on the DevOps, which is a very expensive team and highly skilled, don't waste their energy and time on unnecessary things like writing tickets or handling tickets, except if those tickets are really written for you. Then make sure you address all the needs of your stakeholders in your KPIs and in your platform, more or less.

Multi-Platform DevOps Topology

Obviously because we did some cool stuff, we had the confirmation on the value we are bringing. We were capable of articulating the impact of our work. We were tasked also to ride the next wave of revolution, now the AI, when AI came. The only team that was internalized and fully capable of executing on any revolution was basically now the cloud-native team, the platform team. We were tasked basically to implement the first standards. We were already recognized as capable of pushing and implementing and executing on standards, and also try to imagine the AI as a future platform.

There was no need for me — for me it was clear — to go again in a chaos approach where you execute hundreds of POCs. They all fail and you learn nothing. You don't capture any of the components that are becoming reusable for your future. I created a reference architecture for AI cloud-native. This was like one and a half years ago. We were the first one executing on AI use cases. Now the team basically became something else. You're a multi-platform team. You have, again, the same financial pressure. You cannot grow. You're too expensive anyhow. We're going to get to this conversation as well. Basically, the same team had to take care of several platforms while we kept more or less the same focus and the same structures.

Success Metrics - For Primary Application or Services

Now I'm getting to the best part. This is the framework that I find extremely valuable in my case. This is a framework that looks at enterprise cloud-native platforms and translates the capabilities to the board. My pressure came, especially lately now when I had to defend my work. I had to defend the size of the team and the investment that the organization was doing. I had to continuously defend my work. I was looking really at every single persona in the board and I identify capabilities in the platform that we were owning that they were not even aware of them.

I'm going to give you a few examples. I translated those capabilities in capabilities that the board was taking for granted. One example, which probably you don't see too much, is the compliance enabler. One powerful force in an enterprise in Germany is the COO, Chief Operation Officer. Chief Operation Officer has two major objectives. One is mitigate risk, and this is the most important. The second, it's containing the cost factor, price per unit. This is critical.

We had at one point a conversation where we said, now the team is too big, so maybe we should reduce it now or we should spend less on that. Then having these metrics and monitoring over the years and knowing that we got so many, 14 different audits happening in less than one year, basically the conversation is a little bit simpler. You are able to go in front of your stakeholder and say, we can execute on the cost reduction, but there's going to be a consequence on the compliance, for example. Now we have those persons, that many working on compliance. Are you ok with compliance less than 100%? I would imagine a 95%. The answer was no, of course. Suddenly, the price per unit was acceptable knowing that the investment goes in being fully compliant, something that is very invaluable. This is just one example.

Another one is like the financial performance and also becoming profitable. Because trust me, if you do it right, you're going to become a center of expertise, of excellence. It's going to be so easy for you to sell knowledge, expertise, and even technology. It's quite easy, and that happened to me this year, to start selling technology. Suddenly, the CFO worried that he's spending maybe too much on my team. He realized we actually bring profits. We had €1.4 million profit this year and we're scaling from that. Again, you have one happy, powerful stakeholder that could become your biggest supporter if you manage to identify his needs, quantify those needs, and attach it to the capabilities of your platform.

All the others are quite interesting. We'll go through them. Innovation driver, this is the main driver of the CEO. The CEO and the Chief Transformation Officer, they want to be able to go with new features on new markets as fast as possible. Now test different scenarios and then react. In the case of insurance, you want to test different pricing schemas, maybe different marketing options or strategies. Then you get feedback on different markets. Those capabilities are being offered by a very smart and flexible and well-architected platform. They have to know that. Making sure you reflect that and attach it.

In my case, I attach it to changes, not changes just on production, what DORA does, DORA metrics. It's more on the whole environment because a lot of innovation is happening in a non-productive environment. Then I correlate this one with the financial performance, so the extremes on the bottom. You see a certain progression. We started with the €48 per change, and then we went all the way to €13 per change. We added 100,000 changes a year, which is major. That means that if you have around 150 microservices, then you change that service every day a few times.

You look at the operational resilience, and when you look at the reliability, now, they are balancing each other. You move fast. You generate a lot of capabilities for business to grow new markets with innovative proposals, while you keep the reliability high, which is becoming more important because we, as an entity, we became an InsurTech, and we started writing new contracts with technology companies that were demanding SLAs. As you can imagine, reliability, it's a fundamental KPI that is being reflected in every single contract. That became both a risk factor for reputation, so that concerned the CEO and the COO, of course.

Then, of course, the CTO, Chief Transformation, responsible for all the projects, was very happy to know that his reliability, the main SLA that he's writing in every single contract is being monitored and being kept. Basically, you have most of the boards already being satisfied with the capabilities coming from the platform. Even more, and this is the most important, understanding where those capabilities are coming from, and also the consequences, ideally, of fine-tuning or reducing the cost with anything that has to do with platform.

Key Learnings

Some more learnings. Again, you have to become, I'm calling it influencer, but I would say more educator in this case, or an equal force in driving the operations in your organization, innovation in your organization. You cannot be reactive. I know in Germany this is very hard, because again, the relation between business and technology is not an equal one. IT technology, it's a cost center, and they can be outsourced or not. If you can position yourself as an equal partner here and be a consultant, and be the one that has a strong word in some places where the impact on your work is tremendous, then you should grab it.

One such example I can speak about is the operational model. Last year, I was responsible for advising the biggest group program, and we had less than eight months to go live from nothing to a cloud-native platform in production capable of sustaining a few markets like Australia and Brazil. I was jumping in the conversation a little bit late, and one strong force in enterprise is the architecture function. Even though we were building the platform, we were just optionally invited to a conversation where the architects were writing the operational model. If that operational model would have went live, our life would be a horror movie.

I had time to jump in the conversation, correct that, and I fully re-scripted the operational model, really taking care of the infrastructure behind, because it's not just the infrastructure. This is where the friction comes. You say, infrastructure is such a small component, they should only care about operation. They know nothing about how software is being released, and the compliance and security. No, a platform, it's not an infrastructure. Platform has so many other capabilities which have a tremendous effort on operational model.

You as a platform engineer, platform leader, you have to be the strongest voice in the room when the operational model is being written. If you do that right, if you include the right scope for the right functions, like versioning, for example, on how to release software on APIs and how to decouple services from APIs, then that's already going to simplify your work so much and it's going to set you on a course for success. If not, if someone else is writing it, you're going to be captive in a projection that simply doesn't fit with your work. Very important to get the courage to face this kind of conversation and drive them actively. Don't be reactive to these kinds of conversations.

The last lesson is, simplify everywhere. We destroyed and rebuilt our architecture at least four times. We had the chance because we had to create a new tenant. Then we said, should we copy what we have or do we see it as an opportunity and just destroy everything, rebuild it differently? Then we just change the past. We see it as a legacy. We did that four times, and it worked perfectly for us. It gave us the chance to change significant things, like on the architecture level.

Because if you talk about FinOps, we keep talking about FinOps, we were involved in so many conversations. The biggest force in FinOps is your architecture. If your architecture is strong and designed for FinOps, you don't need to worry about FinOps. You're going to be so efficient that there's no work you have to do there, at least to simplify that.

Sovereignty

I want to talk a little bit more about sovereignty. Because innovation sovereignty is very close to my heart, and I believe sovereignty as a whole and resilience as a whole, needs to be part of every single conversation and even embedded in the design of our platform. When you go probably and you design your next platform, you should think about a certain sovereignty strategy. I feel the boards, and Europe is ready for this conversation. Like a few years ago, we had a conversation about being multi-cloud. I believe the next conversation is going to be multi-cloud and private cloud.

More like have a sovereignty strategy which could be defined on how fast and how expensive it is to move from a hyperscaler to a private cloud or a data center. That needs to be embedded in the selection of the tooling you're making. Going for open source for me was perfect. I already have my exit strategy because I can take 90%, 95% of my tools and move it wherever Kubernetes is running. There is an effort, there is a cost, but it's so much cheaper than anything else. That's a strong argument. That could be another strong argument for you when you build and design your platform.

We all know what happened to Greenland. We know that sovereignty on the state level is under threat. Maybe you don't know this, that Google Maps had a huge bug. It was a huge disruption everywhere, and this is a platform that Europe doesn't control and every single European is using it for his daily activities. This is a good example of why digital strategy or sovereignty or digital sovereignty is a must-have to consider.

The one I want to throw on top of it, my personal take is the innovation sovereignty. This has to do with the capacity to create. If you invest in technology and in humans and you grow them and you give them the space to grow, then you are capable of jumping on the next innovation revolution or evolution. If AI comes, you have the means and the tools to create your future. This is really important. Of course, the spectrum is growing. I joined as a board member the European Resilience Summit lately, and we are talking a lot about the flavors and the spectrum of sovereignty. We are talking about resilience.

There is a lot more we need to focus on, like how many of you thought about cultural sovereignty, for example, which is on the personal level, not on the state level. Now, our mental models are being created by the things we are consuming. Our young people are consuming TikTok, a platform from China, and consuming platform from the United States and consuming LLMs which are written by other states than Europe. There is no democratic or European value reflected more or less in the tools that the next generation is using to learn about life. Those mental models are created as we speak and we have no control over that, which is frightening.

Please, join the conversation. I believe most of us are European or not. I do believe you don't have to be European to be concerned about democratic sovereignty. Because if you're from Canada or other states, you have to choose between what? American sovereignty and Chinese sovereignty. They are the only two having sovereignty in the real sense. Maybe Europe can play a role as open source or open sovereignty where all the other states can contribute. I do see a future for open source here because it fits so well in this fragmented world, and trust has to be in open. That's my take on sovereignty.

Questions and Answers

Participant 1: Really like the mix of teams you did. In particular with AI, I saw it a lot that if you don't mix AI into business, you're basically screwed from the beginning because you will predict something and people will tell you, it's not even a great idea to predict that. How do you solve that? The second question, normally if you define KPIs, developers are great in making their work look amazing in KPIs without changing a lot. How do you make sure that people capture more the spirit of the KPI versus the number they produce?

Sergiu Petean: I'm going to start with the KPIs, because it's the first time we go back to AI integrate into business. KPIs should not be a mechanism for enforcing work ethics. KPIs should be as I was mentioning with DORA. DORA became a tool for engineers to improve themselves. It was not like you had the sheriff in town looking at the board and saying that's a wall of shame and that team is doing bad. We're going to slam them, and then they have to change. They were like looking in the mirror and then understanding much better what they're doing and then comparing with the other teams which were making sense for them because they were handling different domains. Then they were learning from each other.

KPIs should be a way to improve. Then if you have a culture of continuous improvement, then this will be smoothly. If you're, of course, in a toxic environment and everyone is using KPIs as a mean to attack others, then, yes, it's a surviving skill. Who knows how many lines of code I'm writing? Who knows the impact I have now? It's interesting. The metrics here, this is the way, for example, we use KPIs in the platform team to really focus on what's needed.

If you look at the last ones in the table, last KPIs, you're looking at QA, coding. We all decided as a group we need to code, focus, growth, and especially impact. This is done through a survey and the whole team is looking at the people surrounding them, and say, that person really made my life as good as it can, and it grew a lot. The impact he had on my life was tremendous, so I'm going to evaluate him. The team was evaluating themselves, and they're all friends. It's not like they were fighting each other, no. That's a nice way to give feedback and really focus on the impact at the end. We never measure how many lines of code or infrastructure have you designed or APIs. Like, what's the impact? How does your team, the closest to you, perceive you, more or less? If you have this kind of conversation, you should be good.

I created a funnel when I was head of GenAI, and basically the funnel was about making decisions about which use cases should be addressed. We always said ownership was with the products or with the business person that came. He needed to pitch based on numbers and KPIs, saying, we do that investment, I want to have conversational AI investment and I believe I'm going to save that many hours from that many engineers, or customers, agents. They can spend more time in actually selling products and feelings to customers and actually giving them information that's available in the policies.

This funnel helped us keep data-driven conversations and prioritize things that really mattered. Then, choosing the right team to implement it because most of the time we had fake AI. We had things that could be coded, but why should we add an LLM, there's no point. Or it was an ML, old school AI, so there was much better fit for that. Making the right decisions, basically. It's a decision process that has to be driven by data, more or less. Accountability, of course.

 

See more presentations with transcripts

 

Recorded at:

May 25, 2026

BT