Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Scaling N26 Technology through Hypergrowth

Scaling N26 Technology through Hypergrowth



Folger Fonseca shares his experience during the time of hypergrowth at N26, a mobile-first bank which was recently listed as the number one startup in Germany. He talks about the problems they (as engineers and tech leads) faced, the solutions that worked well and those which did not and how their technology adapted to demands of increasing scale and complexity.


Folger Fonseca is a Software Engineer and Tech Lead who has been working at N26 for the past 3 years. His main focus is mentoring team members, building back-end microservices in cross-functional teams, designing systems, and engineering processes that perform at scale.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Fonseca: My name is Folger. I am a software engineer and a tech lead. I've been working in the software industry for around 13 years. In N26, more specifically, I had the privilege of working with four different teams, both playing the role of an individual contributor as a software engineer, and more recently as a technical lead helping or enabling teams to build amazing products. The company looked really different when I started working on it to the point that we were around 100 employees between product and tech. Currently, we are around 500. There is a huge difference between the beginning and what we currently have. That's what I would like to share with you. What does it mean to be in a hypergrowth stage? Which challenges would you face if you decide in-housing, in this trouble? Which is really exciting. At the same time, which challenges we face, which problems we have, and which were the solutions that we found the more adequate to apply to those problems.

About N26

N26 is a German mobile-first bank. This means that we look at banking problems from a different perspective. Our mission is to change the way people do banking all around the world. It is to give everyone the power to bank in their own way. We offer you a simple promise, which is a new approach with the best digital experience at a better price. We are currently available in 26 markets in the European Union and in the U.S. We already process more than 1.6 billion in transactions in a monthly basis. We are in four different locations: Berlin, Barcelona, Austria, and in the U.S. We have around 5 million customers. A really great example of promises is one of our product, Spaces. We know that people need different ways of handling money. We make managing money simple through an intuitive interface. However, I'm not here to talk about the product but more about the technical aspects powering the product.

N26 Tech

N26 is hosted 100% in the cloud, more specifically in AWS. Our tech organization is distributed across multiple places around the world. We work in cross-functional teams. This means that each team have a combination of backend engineers, mobile engineers, quality engineers, and product managers. Each team is responsible for a subset of features or microservices. We deploy those 180 microservices, around 500 times per week. That seems like a really big setup to have. How did we get there?

N26 - 3 Years Ago

Before we can actually jump into the current setup, I would like to give you a small background on how N26 was 3 years ago, so that you can have a little bit of background and context, and see how the challenges that we faced impacted what we had at that moment. Three years ago, we had 100,000 customers. It doesn't seem that impressive. We were in around six markets, which were our core European markets. We just went through a huge migration of running the bank using a partner bank, to actually having a banking license, which was a really big step for us. We just started using microservices, which means that we have a lot of things to learn, starting with a microservice architecture. We were really low in processes. It was mostly face-to-face communication. There was no enforcement of how to use the knowledge, but it was more like a gentleman's agreements, or guidelines.

Our Challenges

At this point, engineers were taking care of three microservices on average. If you had any troubles, it was really easy. You knew everybody in the company because there was around 50 engineers. You knew who was working in what. It was super simple. If you have a problem with one of your downstream dependencies, just go there. Stick with them. Sort it out in a really informal way. It was really easy to color overlay. Engineers have a full ownership from doing the discovery of new products or features, making sure to design those features, deploying those features to production, and maintaining those features while they were living. At this point, we were really trying to enforce a really high bar on our hiring processes, because we want every single engineer to have this level of autonomy, and to be able to develop everything end-to-end without that much coordination. We have only two tech leads at this stage, which were the tech leads of the more important systems that we have, which was the ledger and the card processing systems. These two tech leads acted as leaders of their own teams, but at the same time, they were trying to tackle cross-cutting problems across the organization such as infrastructure, security, improving the deployment pipeline and others. The teams themselves worked independently without the need of somebody telling them how to design a system, but they were just sorting it out. Sometimes if they need any guidance, they will go to these two tech leads and then get some advice.

Experimentation and Knowledge Sharing

At the same time, the teams were extremely young. The average age was around 22, 23, which was a little bit shocking for me, because I was around my 30s when I joined. I saw a lot of young people, but they had such an amazing knowledge on everything. The idea was that we have a mix of really high seniority and some engineers that were just learning, but they were really passionate about it. They really loved what they were doing. They just needed a little bit of guidance on how to do things right. For this, we were really, early on, working with GSDD. It is Get Stuff Done Days. These are two days every six weeks in which engineers are free to do whatever they want. By this I mean, you can learn a new technology that you are interested on, that maybe will fix some of the problems that you're facing on your day-to-day work. Or, you can use this time to tackle some technical debt, because at this stage, everything is a high priority. Everything needs to be cheap. Because everybody wants new features, you never really have time to tackle technical debt. These two days were an opportunity for engineers to focus on those pain points. Or, maybe you are new to organization, you didn't know about microservices, and you want to learn a little bit deeper how microservices work. These two days are for engineers to really be able to develop their knowledge and their skills in whatever they are working on.

The second measure was lightning talks. Lightning talks are presentations that happen on a weekly basis. After you had these two days in which you were learning something, or that you discover something, you are encouraged to share that knowledge with the rest of the organization. In these scenarios, senior members were encouraged to share the most knowledge as possible, because then for all of our junior members, they could get the best source of information. At the same time, if you solve a really interesting architectural challenge in one of your projects, you could use this space to share with the rest of the organization how you approach this problem.


That was the background. That was how N26 was working. Obviously, the main problem that we had at that point was that we didn't have enough people to drive all the amazing features that we wanted to build, or to get more customers, or to roll out to new markets. In 2018, we managed to get one of the most significant investment rounds of N26's history. This was the one that actually made us a unicorn. This means that now we have a lot of resources available to drive hiring, to have more people, and to finally fix all these problems that we have, because of the limitation that we had at hand.

How do you define hypergrowth? Hypergrowth is when you have above 40% of growth of your normal growth on a monthly basis. We went from having these 100 people in product and tech to having around 350, and then 500 in a period of time of less than 6 months. This sounds like a lot. There are a lot of people joining the organization. A lot of people that need to be interviewed. The goal at this point was that we wanted to hire 30 new backend engineers per month. Imagine how many interviews you need to run to hire 30 new engineers. It's a lot. Every single week, or every single two weeks, you will have a new face in the organization, maybe people that were working in new features that you didn't even know about it. This made it not possible for us to keep working in the same way that we were working before. Face-to-face communication is no longer an option, and so on. Now we have more people, so everything should be fine. No.

Our Challenges

The main driver of this role was that we wanted to acquire more users, and we wanted to be available in more markets, in more European markets and outside of Europe. At the same time, how do you get more users and markets? By creating new features, or by enabling market specific, or being in more markets. We do that by having more engineers and having more technical hubs. At the same time, when you grow and more people are using your applications, you realize that security becomes an important concern. Regulators are going to look at you closely, because maybe the degree of impact of your bank being affected by a security vulnerability suddenly is bigger. Reliability and security became a big concern.


I would like to concentrate the main challenges into four categories: people, services, releases, and reliability. They are connected in a way. How do you manage to get more engineers to come inside your organization? For this, the CTO at the time, Patrick, he started driving the technical brand of N26. We wanted to be not only known by the products that we have, or by the user experience that we drive with the bank, but at the same time, to be known as an organization that takes technical concerns as our main priority. Engineers were encouraged to do public speaking, to start writing technical blog posts about how N26 works. At the same time, we wanted to drive as much hiring as possible. At a certain point, I think I was investing 20% of my time just to do interviews, or to preview code challenges.

We managed to get all of those people, but then the first challenges that those new people bring to the organization started arriving. Before, we barely had a new engineer once every three months. This means that when you have somebody new, you will just sit down with that person explaining how the system works. Explaining how to set up their environments. Give them a high-level overview on how the company works. Obviously, if you are having four or five, all together, that no longer works. Normally, the onboarding was done by the person that had the most knowledge of the systems. Certainly, that person does not have enough time to help four new engineers to cover all the onboarding. We needed to come up with something that helps us in our onboarding process. At the same time, because this was happening on all the teams across the organization, we could identify that the onboardings that we were doing were similar across different teams.

Company Onboarding

We extracted that out into company onboardings, which share the best practices that N26 tried to follow in different cross-cutting concerns. We have backend 101, something similar to Monzo. We have some product onboardings on the way that product works inside the organization. We explained, what are our expectations in terms of quality related topics, which are all agile practices, or infrastructure. How important it is for our security and compliance. All of these were done at a company level. We agreed all of us into the content. Then these were done for all the new joiners.

Team Onboarding

That obviously is not enough. Teams, specifically onboardings, needs to be done. In this case, because we didn't want to have the pressure on this more knowledgeable person or the tech lead of the team, we made sure that each time that a new engineer joins a team, there is somebody else from the team that is the one bringing the onboarding. In this way, we accomplished two objectives, one, we have a new engineer which is onboarded and have domain knowledge about the topic that they are going to work on. At the same time, we validate that the people that are already working on the system have the same understanding of the system architecture. Most of the time, we realized that there were some gaps, that people understood the system in a different way. That was an opportunity for us as well, to maybe revisit those topics and make sure that everybody understood the same.

Buddy System

The last is the buddy system. We made sure that everybody that joins, has someone assigned to them. This person will sit down with them, will explain the documentation, will help them set up their environments, will do pair programming sessions. At the same time, will help them navigate all the country-specific bureaucracy. Most of our engineers that joined during this hypergrowth phase were from outside of Europe, and from outside of Germany, obviously. They didn't know German, for example. They didn't know how to find a flat. They didn't know how to move around the city. Having someone that is helping you to mitigate all of this anxiety of joining a new job was super helpful to reduce the amount of anxiety that new engineers had inside the team, so that they can just focus on being productive as soon as possible. At the same time, it was super important for us to have a clear understanding of what were the expectations of these new engineers within the team. Having a clear definition of, what is a junior? What does a junior do? A senior, and a tech lead, was super useful for them to understand what was expected from them during the probation time.

Target Operating Model

Obviously, having more people is not enough. You need to organize those engineers in a way that they can actually deliver value. At the beginning we have a flat structure. We have the CPO or the CTO of the organization, and then underneath we have 10 teams. We multiply that number. We no longer have 2 or 3 engineers, we have 9 or 12 trying to solve a specific domain problem. This means that we need to rearrange the organization. For that, we came up with the target operating model. The target operating model tries to create this hierarchical structure in which we have a segment which has a given purpose. Think of this segment as its own startup, driving their own specific objective and clear results. The segment is free to decide on which product they want to work on, which feature they want to work on, as far as they target these specific organizational objectives. In that way, you can start working independently without having that much coordination with the rest of the segments.

In this way as well, we were able to have these other locations. For example, Barcelona, they started working on financial related topics like credit, savings, overdraft. Those are really isolated topics that don't necessarily need to communicate that much with the headquarters, so that they can build up on features for customers, but at the same time, reducing the amount of coordination that we need to do between the two branches. Thanks to this approach, we managed to have three different currencies: Euros, Pounds, and U.S. dollars. The first external currency from Europe was a really big challenge because it required a whole new restructure of our architecture. We managed to implement as well 3 different banking regulations, more than 10 different payment networks or schemes, and to be working on 4 different locations.


The next challenge was having a consistency internally, and enable team mobility and company alignment. We are running a microservices architecture. This means that from the book, you could potentially have each one of those microservices being written on a different programming language as far as they are exposing a common interface, let's say HTTP. At the beginning, we did that. Our monolithic application is written in Java, which means that most of the microservices at the beginning were written in Java. At the same time, we have some things experimenting with other languages like Scala, and Node. Sooner than later, we realized that this freedom was not as good as we expected.


I'm going to try to focus on the Scala example. I hope that if we have some Scala enthusiasts, you don't take and run. This was just a current experience for the people and circumstance that we had at the moment. It was a really small team, two, three engineers. They had one person that was really into Scala and functional programming. They just started working on the project, the two other members were more junior, that didn't have experience in Scala. Still the same feature pressure was there. They still needed to release. Because of the lack of experience, this made the code complexity to increase really easily and fast. Because Scala provides so many ways of implementing the same, engineers were all the time just in unending debates about how to do a specific feature instead of actually implementing the feature. This slowed down the release of the feature and the maintainability of the project. At the same time, just a few months later, the person that was the most enthusiastic about Scala decided to leave the company, which means that we needed to replace that engineer with somebody else. Unfortunately, from the people on the other things, they didn't want to learn a new language at that point. This, unfortunately, was not possible to maintain.

Decisions and Alignment

We decided to put a brake on trying big technologies, in this case, programming languages. We decided to implement the technical radar. The technical radar's purpose was to have a way to drive innovation but in a controlled manner. It's based on ThoughtWorks. If you are familiar with ThoughtWorks technical radar, you have seen which technologies they are exploring, which technologies they have in trial status, and which one they have adopted as good practices to use. We use something similar. We have a technical radar in which we have the most important or architectural technologies that we allow to use, or infrastructure as backend services. This means that, whenever we are trying something, we are doing it with a given purpose. We are trying a new technology or a new framework, because we know that that new framework or new technology solves a problem in a better way than something that we currently have.

You test this new technology, maybe in your GSDD. In these two days that you have available. You create a proof of concept. If everybody agrees, we can roll it out in a non-critical production service. Only after you validate that the results were the expected ones from the initial problem that you were trying to solve, then we decide to adopt or to put a hold on it. Currently, we only support Java. More recently, we included Kotlin into our technical radar, which has been a huge success. Right now, we have more than 60% of our microservices actually written in Kotlin. This is a good example that you can have both. You can have both coordination and alignment. At the same time, being able to build up knowledge, and build all your libraries and good practices around a small subset of technologies. This covers the people part.


Because now you have more people that are working in features, so this means that most probably, there is going to be new microservices that will be rolled out in the N26 infrastructure. We realized that it was not that easy to avoid micro-monoliths. There are some artificial limitation that could be introduced in a system architecture that were making these situations arise more frequently. What were those ones? We have a semi-automated process of creating new services, which means that half of the configuration or a great part of the configuration was done with an automated process, but some other parts were still manual. This requires some human interactions from our infrastructure team. This meant that the set up costs of new microservices was in the tens of weeks instead of the tens of hours or days. Engineers were reluctant to invest one entire week of their time, just to have a new service. When they were facing the decision of, should I create this functionality in this service that currently exists or create a new microservice? Even if by having a clear separation of concerns, these should go to a different service, they would rather just put it in the existing service.

Infrastructure as Code

We needed to invest heavily on infrastructure as code so that we can cover as much as possible, everything that is involved in the creation of a new service. Infrastructure as code just leverages on the cloud services APIs like AWS. AWS have a graphic interface in which you can create resources by clicking around. At the same time, they expose an API. Infrastructure as code is nothing more than code representing all of those resources. You can have servers, you can have network, you can have security rules all expressed as source code. This means that everything is updatable. Everything is traceable. Everything is reproducible. Each change that we do to our infrastructure is peer reviewed. It goes through an automated process of validation, which is done through our tooling.

This has really helped us to multiply the capacity that our infrastructure engineers have to run N26 microservices. One person in the infrastructure team can now deal with more than 150 instances in a really easy way. How has this paid off? Two years ago, we had Meltdown and Spectre. They were vulnerabilities that affected most of the microprocessors around the world. It affected billions of systems, and most probably there is even systems that currently are still affected by this vulnerability because they cannot really reset them. How did that go with N26? In January 3rd, Google announced that this vulnerability existed. Certainly, the same day, Intel confirmed that this was actually a problem. Just one day after they announced that they have a fix for the majority of the issues, the day after, Amazon announced that they had updated their kernels so that they include the fix. In the same day, we updated our image and we rolled it out across all of our microservices in all of our environments, in all the regions that we have, incrementally and tested. We managed to be free of Spectre and Meltdown in just a couple of days after it was announced. This has really helped us to still keep the speed of a startup, while having the security of a bank.


With more engineers building new services, and having more services available most probably you are going to have as well, more releases more frequently. We run 500 releases per week. At the very beginning, we still had some manual processes, which didn't really scale well. We needed to invest in our continuous delivery pipelines. In the talk related to analyzing the 150 architectures, around 54% of the organization are not fully engaged in a continuous delivery pipeline. This means that there is some manual intervention for a change to actually go to production. People are really afraid of putting things to production. In our case, our engineers just make changes which are code reviewed. Every time that something gets merged into production, there is absolutely no manual checks to be done by a human. Everything is done automatically. This means that we have automated unit and integration tests, end-to-end test, security checks, performance testing, and only the successful deployments in each of the environments are the ones that move forward into the pipeline. This has been super useful because as we are in more regions, for example, right now we have the European zone and we have U.S., so we have at least two zones in which we need to coordinate releases. If this will be a manual process, this will be just unmaintainable.

We follow for this, a blue-green deployment approach. This means that every single service that we have is always behind a load balancer. When we want to have a new version of the application, we take one of those instances outside of the cluster. We apply the new configuration, and we run all of our automated tests on top of that. Only when all the testing cases pass, we actually propagate the changes across the entire cluster, trying to follow a zero downtime principle. What is really interesting is that every single deployment is immutable. We take a picture of the release, which enables us to go backwards in case that something went wrong. At the same time, each deployment is really small. The smallest as possible. It's not just about the tooling, but it's about the culture on the teams directly, on defining how they are going to build their changes. The smaller the changes, the less the impact that that change is going to have in production. We run those builds around 800 times per day, but only obviously the successful ones are the ones that actually goes to production.


Now that we have releases, then reliability was one of the concerns. We have a lot of new team members that were not maybe in the same page, in relation to reliability, because maybe they were not coming from the banking industry. Because we are in the banking industry, this means that if we are not available, that people don't have access to their money, or that they cannot pay their rent, which is something that we take extremely seriously, and regulators take seriously as well. We needed to do some improvements in the reliability area. The goal was to keep doing this high release rate, but still keep the high availability and minimize the risk as much as possible.


One of the first problem that we have is with incidents. Incidents are the correlation or the disruption of any of the N26 services that might have user impact. The best way to avoid incidents is to actually avoid change. That's the only way to avoid it. Who hasn't had an incident? Everybody has an incident, even Twitter, GitHub. The best way to be prepared for incidents, is actually to minimize the time that it takes you to resolve an incident. Incidents are going to happen, you just need to be prepared for dealing with them as fast as you can.


Incidents are split into four stages. The first one is the detection. What does detection mean? It is when you ship a change to production, how long does it take you to understand that there was something wrong in this release? Ideally, the answer comes directly from your tooling, from your alerting, because you may have automated your business KPIs within your monitoring tooling. Not that it is coming from your customer. As soon as your customer is telling you that something is wrong with your services, then at that point it has escalated and you will have a lot of pressure. The idea is that the detection is done automatically by your tooling and in the shortest amount of time.


The next one is the diagnostics. Once you know that there is something wrong. How long does it take for you to understand what is wrong? In here, it is mostly about observability. Do you have enough dashboard to tell you exactly which component of your application is the one that is being impacted by this change? Do you have specific alerts that tell you what you should check, given a specific problem? The idea is that you don't need to spend a lot of time troubleshooting locks maybe, or looking around at different parts of the system, but that at first glance, you can understand exactly what is going wrong. The idea is that you minimize the time between each of those stages.

The Fix

The next one is the fix. This is all about code quality. Is your code with really low coupling? It is easy to make changes on your code. Do you feel comfortable about making changes in your code under such an amount of pressure, because maybe you have created enough amount of test cases to tell you that by introducing a fix, you're not breaking something else? In this case, it is all about investment on having a healthy codebase, and having enough automated tests to back you up whenever you need to do something really quickly.


The last one is the delivery. Once you have a fixed match into production, how long does it take you to actually put it into every single environment that you have? Ideally, all of those times are the minimal as possible.

Blameless Postmortem Culture

When do you actually spend time on improving? The idea is that you come up with a blameless postmortem culture. We monitored a number of incidents that we have on a monthly basis and which systems are the ones that are getting affected. We started noticing that incidents start recurring. This was because all of the new engineers didn't have enough time to focus on fixing them because they were always just building new things. By having this postmortem culture, we make sure that there is a space in which we reflect about what went wrong. That includes a postmortem. It includes a written record. We want to make sure that everything is written down about what happened during the incident. We want to have what was the actual impact of the incident. How many users were impacted? How much money did we lose? How many customers were not able to sign up? Which actions did you take to mitigate this specific incident? In case it's a recurring incident, somebody else can go back, read it, and just apply the same mitigation actions. Obviously, you need to identify what was the root cause. What actually triggered this specific incident? The last one, and the most important one is, what are the remediation actions that you're going to take to avoid this incident ever happening again?

This documentation was super helpful for us for knowledge sharing as well. Whenever an engineer was involved in these postmortems, even if they were not part of the team that was the one causing the incident, they will understand that there were things that they should be taking care of. Maybe the incident escalated really badly because there was no automated alerts, or there was no end-to-end testing for a specific critical path. By having all of these things in mind, then it will help you not just for the incident production, but at the same time for building features that are more reliable towards the future. We use these postmortem reports as well for the training of new on-call members. Whenever there is a new member and we want to do an onboarding on them, we can go back and check one of the postmortems, and give this person the symptoms that were happening during this specific incident. Try to figure out, if this person will be able to discover the root cause of the incident and discover how to fix the specific incident.

Lessons Learned

You have to adjust to your individual journey conditions. I think something that is really constant across QCon is that every team, every individual, every organization is different. The challenges that we face might be different than the challenges that you are going to face. You need to make this step back. Look at the problems that you have. Look at the ones that are generating the most problems. Try to tackle it first. Then just move to the next one, until you have everything sorted out. Try to make engineers to be driver changers, and not just driver changers observers. Make sure that everybody has this mentality of, there is this thing that is really wrong and should be fixed. I'm just going to wait for somebody to fix it. No, make them actually promote for those changes to actually happen. Continuous delivery has enabled us to practically have more feature releases with a more confident way, and actually to achieve our business goals. After you grow and after you have a big setup, make sure that you invest heavily in keeping your system reliable, while they are running in production.

Questions and Answers

Participant 1: The services that you deploy, hundreds, what do you do by way of integration testing them all together before they go live?

Fonseca: We have a pre-production environment. Our current pipeline is, we have development environment, pre-production environment, and production environment. The systems are deployed to the pre-production environments. Then we have an end-to-end testing suite, which is running once the service is deployed in that specific environment. That covers most of the critical path changes. Obviously, not everything because otherwise it will take forever. Obviously, this is not in the production environment because not everything can be tested on the production environment without having side effects or impacting accounts in production. This is the way that we currently do it, so far.

Participant 1: How long does that suite take?

Fonseca: It's hard to say because it evolves. You normally have some critical changes associated with your specific application. Depending on the microservice, it might change, depending on the dependency. I don't think it takes more than 8 minutes.

Participant 2: You were talking about your CTO as building the tech brand for you guys for a long time. I know hypergrowth is pretty hard, it came with some issues. How are you guys dealing with the detractions from Glassdoor, coming from 4.9 to 3.2? What are you doing for that?

Fonseca: A really important thing to have in mind is that not everybody will be willing to work on a company that is in a hypergrowth phrase, or in a scale-up phase. I think it's entirely fair for people not to like working in that way. Some engineers are more hands-on. They don't want to be involved in meetings. They don't want to be involved in alignment. They just want to be with their heads down and building things. That's ok. Then a startup environment will be better for them than maybe a bigger company with more structure and more alignment required. When you are growing this big, a lot of things are changing really quickly and fast. I think it's impossible to keep an eye on everything. You are going to make mistakes. That's something that you need to learn to deal with. The important part is to listen to the feedback from the people that are leaving, and then try to adjust as much as you can. I think every single company has these issues.

Participant 3: You mentioned that tech radar is a way to align technologies used in the company. How do you actually make sure that the teams follow that radar, in the end?

Fonseca: There are guidelines. We assume that on the good faith of the engineers that are working on the company. Each team have their own technical leads. For example, using a different language will not be something that will just lose somebody's eyes. In that sense, so far, we haven't had to enforce it that much. It just works.

Participant 4: You invested by hiring a lot of new people before you had new customers, or was it the other way around, where there are a lot of new customers coming in, which allows you to hire new people? You had to take a risk there.

Fonseca: I think more or less both happen quite at the same time. As soon as we secured investment, we obviously started spending a lot more money on marketing and pushing for new products. I think both went together.

Participant 5: I think the tech radar is a really good idea, in terms of trying to enforce some route through to, this is an accepted technology. How granular did you go? Did you do it just for new languages and frameworks, or did you do it for every single third-party dependency that you're looking at?

Fonseca: We try to do it for what we consider architectural decisions, like decisions that are really hard to change. Everybody is free to use libraries, as far as they are approved by security, and that they don't have any obvious security vulnerability. I think we stop at the level of the framework. It's just language and framework. Then there are some, for example, SQL databases, we only use passwords, because we have some concerns related with data management, such as audits, backups, encryption, permission, segregation, and everything, that if you allow everybody to use a different SQL database, you will need to implement those concerns every time. There is similarity with a lot of tooling that we use, for example, our secret management tooling is JVM based. If you will use Ruby, you will need to reinvent all of those tooling that we already have invested a lot of time for the entire organization. That's why I think we let the teams decide. I believe that if you want to come up with a new initiative because you believe that a new language will be more productive for a team, if it is validated and demonstrated, the organization will just adopt it.


See more presentations with transcripts


Recorded at:

Sep 25, 2020