InfoQ Homepage Podcasts Katie Gamanji on Condé Nast’s Kubernetes Platform, Self-Service, and the Federation and Cluster APIs

Architecture & Design

Katie Gamanji on Condé Nast’s Kubernetes Platform, Self-Service, and the Federation and Cluster APIs

Jan 04, 2020

In this podcast, Daniel Bryant sat down with Katie Gamanji, Cloud Platform Engineer at Condé Nast International. Topics covered included: exploring the architecture of the Condé Nast Kubernetes-based platform; the importance of enabling self-service deployment for developers; and how the Kubernetes’ Federation API and Cluster API may enable more opportunities for platform automation.

Key Takeaways

Founded in the early 1900s, Condé Nast is a global media company that has recently migrated their application deployment platforms from individually-curated geographically-based platforms, to a standardised distributed platform based on Kubernetes and AWS.
The Condé Nast engineering team create and manage their own Kubernetes clusters, currently using CoreOS’s/Red Hat’s Tectonic tool.
Self-service deployment of applications is managed via Helm Charts. The platform team works closely with their “customer” developer teams in order to ensure their requirements are being met.
The Kubernetes Federation API makes it easy to orchestrate the deployment of applications to multiple clusters. This works well for cookie-cutter style deployments that only require small configuration differences, such as scaling the number of running applications based on geographic traffic patterns.
The Cluster API is a Kubernetes project to bring declarative APIs to cluster creation, configuration, and management. This enables more effective automation for cluster lifecycle management, and may provide more opportunities for multi-cloud Kubernetes use.
The Condé Nast platform Kubernetes Ingress is handled by Traefik, due to the good Helm support and cloud integration (for example, AWS Route 53 and IAM rule synchronization). The platform team is exploring the use of service mesh for 2020.
Abstractions, interfaces, and security will be interesting focal points for improvement in the Kubernetes ecosystem in 2020.

Subscribe on:

Show Notes

Can you briefly introduce yourself and what you're doing at work? -

01:15 I'm one of the cloud platform engineers for Condé Nast, part of the cloud platforms team.
01:25 We are focussed very heavily on operational excellence; the way we deliver the platform and the way that it is used by our customers.

Before moving to Kubernetes, how were the global data centres managed at Conde Nast? -

01:55 There a lot of heritage here.
02:00 About twenty years ago, up to two years ago, all of our tech stacks were individual for every market.
02:10 They were usually outsourced to a third-party, who would take care of the infrastructure and the way it was delivered to the customers.
02:20 The way the content was delivered used a custom CMS for each market, which wasn't a sustainable way of moving forward.
02:35 We had a pre-Kubernetes implementation based on EC2 which we went to market; however, moving forward we decided to move to a self-service Kubernetes platform.

What were the main motivations on moving to a new platform like Kubernetes? -

03:00 For us, the motivation came from the fact we have to deploy content for countries like China and Russia.
03:15 These are challenging to provide infrastructure to; but we didn't want a snowflake model where we were deploying different infrastructure in different regions.
03:25 We really needed something deployable everywhere, which has a lift-and-shift capability.
03:35 We thought that if we ran it on Kubernetes, then we would just need VMs everywhere.
03:45 This is the way we've provisioned our infrastructure ever since.
03:50 We have self-service Kubernetes; we manage our upgrades, our lifecycle, and tedious - but it is what worked best for our use case.

Are there any highlights on the journey to Kubernetes? -

04:20 Our journey to adopt Kubernetes was smooth, although we did have the learning curve of operating Kubernetes in house and developer training.
04:40 The actual challenges from a technical point of view came down the line.
04:50 We had lots of tools to provision, monitoring, logging, traffic management was quite good.
05:00 When we were mature, the traffic grew, and we got into things like the life cycle management of the cluster was challenging.
05:15 We had to automate most of the upgrade processes.
05:20 One of the most challenging things at the moment is to keep up to date with the Kubernetes version.

What does your Kubernetes platform look like? -

05:50 We use Techtonic installer, which is a CoreOS tool - which will be merged into OpenShift container platform.
06:00 This will make it difficult to use this installer in the future, unless we fork and maintain it ourselves in the future.
06:15 We don't necessarily want to go this way; we don't have the resources to maintain such a project.
06:25 We are going to look at the way we deploy clusters in a more efficient manner, and maybe change the cloud provider.
06:35 We are currently using AWS but we may want to change this in the future, so we want to have the capability to do so if we want to.

How do you manage data stores on Kubernetes? -

07:15 We don't operate any databases in our cluster.
07:20 We have some, but not anything related to the applications.
07:25 Our applications are stateless, so we don't have the complexity of shifting data around the clusters.

Have you created any custom resource definitions (CRDs) in Kubernetes? -

08:00 CRDs are quite popular with open source projects, but we don't use it in our applications.
08:10 The way we deploy to the cluster is by having a global Helm chart that extends to different sub charts.
08:20 Our base chart will deploy a service account associated with that deployment.
08:25 The teams will be able to define a service or ingress, and this is all done via Helm charts.
08:35 The teams only need to provide a few lines of configuration; so we don't have a use case or need for custom resource definitions.
08:40 We use Helm, so we can deploy to the cluster very easily without any extra configuration on top.
09:05 For us, it's extremely important for our customers to be able to deploy easily.
09:15 It's not efficient to have a top-notch cluster if it's not easy for our developers to deploy to.
09:25 We try to abstract as much as possible, giving them the context of which cluster it is, but we abstract the deployment process to allow them to deploy easily.
09:40 Things like a specific load balancer, specific port, public or private - how the application are shown to the customers.

Are the developers your customers? -

10:00 I'm trying to think this way - when you have a customer vision, you emphasise what their customer wants.
10:15 They're referred to as developer teams internally, but I like to think of them as customers.

Can you give us an overview of the Kubernetes Federation API? -

10:40 Federation is about how to manage multiple applications and services in multiple clusters.
10:50 One of the most important points I want to make that federation is about application resource management, not about cluster management.
11:05 If you have multiple clusters, and you deploy applications across clusters in a similar manner, then federation is a way to abstract the deployment process to multiple clusters.
11:20 You are going to have a federated deployment, which is going to propagate configuration to specific clusters.

How does a federated deployment look to developers? -

11:35 Usually the way I try to explain federated deployments is that you have a couple of extra points of information to your deployment.
11:45 You have a deployment spec for your application - nothing is going to change there - but you're going to have a couple of extra data points; the cluster placement and the configuration overrides.
11:50 The cluster placement has a location where you want the application to run; for example, Tokyo, Frankfurt and North America - you have a very declarative way of saying where your application is running.
12:10 The second part, the configuration overrides, is for you or the developers to tailor their deployment to a specific cluster.
12:20 If you have more traffic in Tokyo, then you can say there are more replicas in that cluster in the overrides.
12:45 Federation is great when your application is very similar in every region.
13:00 If your application is heavily tailored per region, then potentially federation isn't going to be the answer.
13:05 The configuration overrides would be very tedious to manage in that instance.
13:10 Federation is a powerful mechanism when used for the right use cases.

Where's the best place to learn about the Federation API? -

13:45 The main SIG for federation is SIG Multi-Cluster, so if you search for that you will be able to see all the projects available in that SIG.
13:55 You'll find federation, cluster registry - a very important part of federation.
14:10 From all of this multi-cluster, you'll be able to transition to the other projects.
14:15 As well, when I started to work with federation, it was difficult to get started - so I wrote an article about it which gives a comprehensive way of where and why federation is useful.

Could you introduce the Cluster API? -

15:15 I think it's a confusing name - most people think that it's a Kubernetes API.
15:25 Cluster API is a set of APIs that provides a common interface for how to provision, manage and maintain your clusters in different cloud providers.
15:40 The way it aims to work is to have the same configuration for both AWS and GCP providers, just with a simple configuration change.
15:50 It makes it very easy to provision clusters in a standard way using a common interface.

What opportunities does it open up? -

16:20 I found when using it I could provision a fully-fledged cluster in under ten minutes on AWS, with VPC provisioning, IAM roles attached, all the subnets - in less than ten minutes!
16:55 This was in its first release, but it had a second release in September 2019 it was even better.
17:15 If you want to provision a cluster, you have a cluster resource and infrastructure provider configuration.
17:25 You can declaratively specify that you want a cluster in AWS in Asia, and you only need to change that line to configure the cluster.
17:40 If you are in an ecosystem where you need to create clusters efficiently, in different cloud providers, but without differing configuration for the cloud providers themselves, this is the tool for you.
18:00 Once you have it working, it's magnificent.

Do you think we'll see more clusters being created instead of namespaces now? -

18:20 I think so - I think it will help to create clusters everywhere.
18:25 If you're in a team, you can create a cluster in a single cloud provider.
18:35 If you're in a cloud infrastructure team, and you want to use a bootstrap service (like we did with Tectonic) then just transitioning between providers can be difficult.
19:00 Cluster API allows you to create clusters, possibly for additional clusters in a way that stands alongside the dev and prod clusters, then it makes it very easy.
19:20 At Condé Nast, we don't currently provision a cluster for each developer team, but it might be nice if we could.

Where do people go to learn more about the cluster API? -

19:40 There's a SIG called cluster lifecycle, where there's a cluster API book, quick-start guide, infrastructure providers and bootstrap providers.
20:05 I've written an article about getting started with cluster API which explains the use case, how to get started,

I heard that you were recently made a Traefik Ambassador. Could you tell us about this? -

20:55 That was an interesting experience - they released a Traefik Ambassador community at KubeCon in San Diego.
21:05 I think the Traefik Ambassador is recognising those that participate in the community and are interested in Traefik.
21:15 In Condé Nast, we are using Traefik as our Ingress controller, which manages the external access to the services within our cluster.

What's the motivation for using Traefik over Envoy or NGINX? -

21:40 The company adopted it two years ago before I joined.
21:45 Traefik at the time was solving some of the problems which are useful to us, like a Helm chart.
22:05 The Helm support, plugging into different cloud providers, was useful - we were on AWS and needed to provide load balancers, security groups, endpoints and integration with Route53.
22:30 All the prerequisites we had made it an easy choice.
22:40 For our teams to configure the load balancers - we have a public and private load balancer per namespace.
22:50 The private load balancer can be configured by the team - but you might have a number of different engineers who can do this.
23:05 They feed the changes through to a heml chart via a YAML file.
23:15 What happens in Traefik is that they have IP addresses, put it in a security group and attach it to a load balancer.
23:30 The configuration file is elegant.
23:40 We wanted our teams to be self-service.

Are you using a service mesh? -

23:55 We are on a journey; we are in the process of merging with our US division.
24:05 Now we've merged, we have two platforms and we have points that need merging together.
24:20 Moving forward, we're having a convergence of two platforms, but we want to adopt a service mesh in the future.

If you are new to Kubernetes, what do you think is the best way to learn about it? -

25:05 It depends on what kind of learner you are - if you're a hands-on practical person, you can go through the exercises and tutorials.
25:15 The tutorials take you step by step to deploy your clusters, and you can learn Kubernetes that way.
25:30 There are video tutorials that take you through step by step, courses and most of them are free.

The Kubernetes community has been growing year-on-year, hasn't it? -

25:55 One of my intentions is to grow the awareness of the projects I'm talking about.
26:05 Look at the interest at the tool itself, and the number of attendees at KubeCon.
26:15 I was talking to a CNCF person at San Diego - last year in Seattle there were 8,000 attendees.
26:20 Now there were 12,000 attendees in San Diego, and they are expecting 16,000 attendees in Boston.
26:30 I'm really excited to see where this journey goes.

What are the biggest challenges in Kubernetes? -

26:45 There's a lot of projects going on in Kubernetes.
27:00 Making Kubernetes secure is going to be the next step.
27:05 There are still people debating as to whether Kubernetes is the right tool, but most of their concerns are from a security standpoint.
27:15 It's a great tool to run workloads in a fast way; however, if something goes wrong, it propagates wrong very fast.
27:30 Once they make a strong standpoint that it's secure if you do these things, and provide a more transparent way to run it - then banks will get on board.
28:05 One of the things that I've noticed is that you have interfaces for different tools - they're trying to create a capability that uses Kubernetes underneath.

You'll be keynoting at QCon London in 2020 - can you share what you want to talk about? -

29:00 I'm still working on the outline, but I think I want to talk about how the Kubernetes interfaces came about, and why people are contributing to that.
29:10 You have major vendors that are contributing to these interfaces; it doesn't make them the primary user for those storage or volumes.
29:30 This is what Kubernetes tried to do - it doesn't matter what language your application is written in, you can run it on Kubernetes.
29:45 If you want to have service meshes running on top of Kubernetes; in the future, it's going to be nicely abstracted and easily done - it's going to be the standard.

What's the best way to follow you? -

30:25 I'm on Twitter, LinkedIn, Medium and on GitHub, but I've not reached the stage when I'm contributing to Kubernetes yet.

Mentioned

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.