BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Scaling Infrastructure as Code at Challenger Bank N26

Scaling Infrastructure as Code at Challenger Bank N26

To launch their banking platform globally in the US, Brazil, and beyond, the challenger bank N26 introduced a new layer for the configuration of geographic regions in their architecture, where product development teams can add application-specific needs. At FlowCon France, Kat Liu presented why and how they introduced this layer, the benefits that it brings, and the things they learned.

There are a lot of regulations when it comes to building a banking product that is available in both the EU and US, Liu explained. US data must be stored in a datacenter physically located in the US, and the data have to be completely separate from the EU. This entails provisioning an entirely new regional datacenter with full functionality, deploying shared services to both regions, and figuring out how to route requests to the correct datacenter if need be, she said.

A new region is similar to a new environment, so the focus for N26 was on how to make application configuration as opaque to the application as possible, but also to offload some application-specific configuration from the shared Site Reliability Engineering (SRE) team. The solution N26 came up with was to create an additional layer between what they originally considered to be "application," and the "infrastructure".

Liu mentioned that they ended up putting all of the configuration, like database hosts, AWS resources, AWS permissions, etc. into this layer, and giving the product development teams more freedom to add their own application needs. This layer was the one responsible for taking the different regions and environments, choosing the correct configuration and exposing it to the layer above, she said.

While there was a bit of a learning curve when it came to configuring the application, ultimately it gave a lot more autonomy to the development team to make application configuration changes without needing to involve SRE, who are focused on overall environment stability, Liu explained. Some of these changes included adding permissions to a new S3 bucket, creating a new topic on a stream, etc, essentially anything that required the service to interact with external systems, Liu said.

Infrastructure and configuration management shouldn’t always be considered as one layer that is only managed by SRE, as Liu explained:

The interface between devs and SRE is lower down the application hierarchy than we think; once we embrace this fact, teams have a high chance of becoming more productive. To me, this is the heart of what we typically refer to as "DevOps". It is the mindset that developers are not blocked and idle when they run into issues that their applications depend on; they will dive in, investigate the cause, and be empowered to fix problems, since it’s everyone’s job to keep things running smoothly.

InfoQ interviewed Kat Liu, senior software engineer at the N26 Authentication Team, after her talk at FlowCon France 2019 about the infrastructure of N26, the challenges of providing their services in the US, how implementing the new layer aligns with the growth of N26, and isolating configuration from code.

InfoQ: What does the infrastructure of N26 look like?

Kat Liu: N26 is a relatively young company, so we take advantage of a modern tech stack. We have 100+ microservices running in production in different languages and frameworks (though predominantly Kotlin or Java and Spring Boot); each service is Dockerized and managed by HashiCorp Nomad, and we typically practice continuous deployment, so every merge goes through rigorous automated testing and gets deployed to live.

InfoQ: What challenges did you face in providing the N26 products and services in the US?

Liu: The changes that we encountered had a huge impact on the application configuration, and because it sits somewhere between and dev and DevOps, it was tricky to draw a line of separation between these two teams.

The development teams needed the SRE team to get all of their application dependencies into the new region (databases, queues, s3 buckets, etc), but that’s only a portion of the work that’s required in getting a product up and running.

The SRE team were run as their own team with their own backlog, and didn’t have much visibility into all the application-specific needs of the backend teams. As such, they became a huge bottleneck to getting the backend services running in the new region.

InfoQ: N26 is growing very fast. What impact did this have on your approach to new regions?

Liu: The solution we came up with really fit N26 for the size it had become. Had we been at an earlier stage, we most likely would have been able to just leave all the application configuration with the SRE team, because we would have been fewer teams and fewer services. We only implemented this additional layer when we realized there was a bottleneck in the speed at which we could iterate on deploying these new services into the new region.

InfoQ: You mentioned in your talk that a new region should only affect configuration, not code. Can you elaborate why?

Liu: Code doesn’t (or at least shouldn’t) change between deploying to environments. What does change is configuration variables (database host URLs, API keys that also talk to different partner environments, etc.).

It’s always best to group and co-locate logic that changes together, and isolate those changes from other groups of logic that don’t change under the same circumstances. Otherwise you run the risk of inadvertently affecting something that shouldn’t have changed, and vice versa. And when it does happen, you may see an explosion in configuration matrices within the application, since the application needs to take in all the environment parameters and decide for itself which configuration to choose.

It also makes everything a bit harder to manage and organize. You wouldn’t put your socks and underwear in the same place you put your forks and knives, right? The same principle applies to software development :)

InfoQ: What impact does the extra layer for regions have on your deployment process?

Liu: Without this layer, the development team would have to wait for days and up to weeks for the SRE team to provision new infrastructure or permissions to a new database, stream topic, or any external service. Since giving teams the knowledge and capability to make their own configuration management changes, we’ve seen the feedback cycle shorten quite a bit. Teams are able to move faster since they own more of the process.

Rate this Article

Adoption
Style

BT