Key Takeaways
- Zero trust is a security model that’s garnered a lot of hype—but despite the marketing noise, it has some concrete and immediate value for security-conscious organizations.
- At its core, zero trust moves authorization from “verify once at the perimeter” to “verify everywhere, every time.”
- To do this, zero trust requires us to rethink the notion of identity and to move away from location-based identities such as IP addresses.
- Kubernetes adopters have a distinct advantage when implementing zero trust at the networking layer thanks to the sidecar-based service meshes, which provide authentication and authorization at the most granular level without requiring application changes.
- While service meshes can help, Kubernetes security remains a complex and nuanced topic that requires understanding multiple layers of the stack.
Zero trust is a powerful security model that’s at the forefront of modern security practices. It’s also a term that is prone to buzz and hype, making it hard to cut through the noise. So what is zero trust, exactly, and for Kubernetes, what does it mean in concrete terms? In this article, we’ll explore what zero trust is from an engineering perspective and build a basic framework for understanding its implications for Kubernetes operators and security teams alike.
Introduction
If you’re building modern cloud software, whether with Kubernetes or something else, you’ve probably heard of the term “zero trust.” The zero trust model of security has become so important that the US federal government has taken notice: the White House recently issued a memorandum setting forth a Federal zero trust strategy that requires all US federal agencies to meet specific zero trust security standards by the end of FY 2024; the Department of Defense created a Zero Trust Reference Architecture; and the National Security Agency published a Kubernetes hardening guide that described best practices zero trust security in Kubernetes specifically.
With that kind of buzz, zero trust has certainly attracted a lot of marketing attention. But despite the noise, zero trust isn’t just an empty term—it represents some profound and transformative ideas for the future of security. So in concrete terms, what is zero trust, and why is it suddenly so important? And what does zero trust mean for Kubernetes users specifically?
What is zero trust?
As you would expect, zero trust is fundamentally about trust. It’s a model for addressing one of the core questions of security: is X allowed to access Y? In other words, do we trust X to access Y?
The “zero” in zero trust, of course, is a bit of an exaggeration. For software to work, obviously something needs to trust something else. So zero trust isn’t about removing trust entirely so much as reducing it to the bare minimum necessary (the well-known principle of least privilege) and making sure it’s enforced at every point.
This may sound like common sense. But as with many new ideas in technology, the best way to understand zero trust is to understand what it’s a reaction to. Zero trust is the rejection of the idea that perimeter security is sufficient. In the perimeter security model, you implement a “hard shell” around your sensitive components. For example, you may have a firewall around your datacenter that is tasked with keeping bad traffic and actors out. This model, sometimes called the castle approach, makes intuitive sense: the castle walls are there to keep the bad actors out. If you’re inside the castle, then by definition you’re a good actor.
The zero trust model says that perimeter security is no longer enough. According to zero trust, even within the security perimeter, you must still treat users, systems, and network traffic as untrusted. The DoD’s Reference Architecture sums it up nicely:
“[N]o actor, system, network, or service operating outside or within the security perimeter is trusted. Instead, we must verify anything and everything attempting to establish access. It is a dramatic paradigm shift in [the] philosophy of how we secure our infrastructure, networks, and data, from verify once at the perimeter to continual verification of each user, device, application, and transaction.”
Of course, zero trust doesn’t mean throwing away your firewalls—defense in depth is an important component of any security strategy. Nor does it mean we get to ignore all the other important components of security, such as event logging and supply chain management. Zero trust simply requires us to move our trust checking from “once at the perimeter” to “every time, everywhere.”
To do this properly, however, we need to rethink some fundamental assumptions about what “trust” means and how we capture it.
Identity
One of the most immediate implications of zero trust is that it changes the way we think about and assign identity, especially system identity.
In the perimeter model, your location is effectively your identity. If you’re inside the firewall, you’re trusted; if you’re outside it, you are not. Perimeter-based systems can thus allow access to sensitive systems based on things like the IP address of the client.
In the zero-trust world, this is no longer sufficient. Your IP address is an indication of location only and thus is no longer sufficient to determine whether you are trusted to access a particular resource. Instead, we need another form of identity: one tied to a workload, user, or system, in some intrinsic way. And this identity needs to be verifiable in some way that doesn’t itself require trusting the network.
This is a big requirement with many implications. Systems that provide network security but rely on network identifiers like IP addresses, such as IPSec or Wireguard, are not sufficient for zero trust.
Policy
Armed with our new model of identity, we now need a way of capturing what type of access each identity has. In the perimeter approach described above, it’s common to grant full access to a sensitive resource to a range of IP addresses. For example, we might set up IP address filtering to ensure that only IP addresses from within the firewall are allowed to access a sensitive service. In zero trust, we instead need to enforce the minimum level of access necessary. Access to a resource should be as restricted as possible, based on identity as well as any other relevant factors.
While our application code could make these authorization decisions itself, we typically instead capture it with some form of policy specified outside the application. Having an explicit policy allows us to audit and change access without modifying application code.
In service of our zero trust goals, these policies can be very sophisticated. We may have a policy that restricts access to a service to only those calling services that need to access it (i.e., using the workload identity on both sides). We may refine that further and allow only access to certain interfaces (HTTP routes, gRPC methods) on that service. We may refine that even further and restrict access based on the user identity responsible for the request. The goal, in all cases, is least privilege—systems and data should be accessible only when absolutely necessary.
Enforcement
Finally, zero trust requires that we perform both authentication (confirmation of identity) and authorization (validating that the policy allows the action) at the most granular level possible. Every system that is granting access to data or computation should be enforcing a security boundary, from the perimeter on down to individual components.
Similar to policy, this enforcement is ideally done uniformly across the stack. Rather than each component using its own custom enforcement code, using a uniform enforcement layer allows for auditing, and decouples the concerns of application developers from those of operators and security teams.
Zero trust for Kubernetes
Faced with the requirement that we must rethink identity from first principles, reify trust in the form of policies of arbitrary expressiveness, and permeate our infrastructure with new enforcement mechanisms at every level, it is only natural to experience a moment of panic. And did I mention we need to do this by FY 2024?
The good news is that for Kubernetes users, at least, some aspects of adopting zero trust are significantly easier. For all its warts and complexities, Kubernetes is a platform with an explicit scope, a well-defined security model, and clear mechanisms for extension. This makes it fruitful territory for zero-trust implementations.
One of the most direct ways to tackle zero trust networking in Kubernetes is with a service mesh. The service mesh takes advantage of Kubernetes’s powerful “sidecar” concept, in which platform containers can be dynamically inserted at deploy time alongside application containers as a form of late binding of operational functionality.
Service meshes use this sidecar approach to add proxies into application pods at runtime and wire these proxies to handle all incoming and outgoing traffic. This allows the service mesh to deliver features in a way that’s decoupled from application code. This separation of concerns between application and platform is central to a service mesh’s value proposition: of course, these features could be implemented in the application directly, but by separating them, we allow security teams and developers to iterate independently from each other, while still working towards the shared goal of a secure but fully-featured application.
Since the service mesh handles the default networking to and from the application, it is well-positioned to handle zero-trust concerns:
- Workload identity can be drawn from the pod’s identity within Kubernetes rather than its IP address.
- Authentication can be performed by wrapping connections in mutual TLS, a variant of TLS in which identity is verified on both sides of the connection using cryptographic proof.
- Authorization policy can be expressed in Kubernetes terms, e.g., via Custom Resource Definitions (CRDs), making policy explicit and decoupled from the application.
- Most importantly, enforcement is done at the level of individual pods uniformly across the stack. Each pod does its own authentication and authorization, meaning that the network is never trusted.
Together, these deliver the majority of our zero trust goals (at least for Kubernetes clusters!). We have workload identity rather than network identity; enforcement at the most granular level—the pod—and a consistent and uniform way of applying authentication and authorization across our stack, without altering the application.
Within the basic framework, different service mesh implementations provide different tradeoffs. Linkerd, for example, is an open-source service mesh and graduated project of the Cloud Native Computing Foundation that provides an implementation focused on simplicity first and foremost, drawing workload identity directly from Kubernetes ServiceAccounts to enable “zero config,” on-by-default mutual TLS. Similarly, Linkerd’s Rust-based micro-proxies deliver a minimalist implementation to zero trust.
Of course, just adding a service mesh to the cluster is not a panacea. Once installed, the work of defining, updating, and evaluating authorization policies must be taken on. Cluster operators must be careful to ensure that all newly-created pods are paired with their sidecar component. And of course, the service mesh itself must be maintained, monitored, and kept up to date, like any software on the cluster. However, panacea or not, the service mesh does provide a shift from a default of unencrypted, unauthenticated traffic in the cluster, to a default of encrypted, authenticated traffic with strong workloads identities and a rich authorization system—a big step toward zero trust.
Conclusion
Zero trust is a powerful security model that’s at the forefront of modern security practices. If you can cut through the marketing noise around it, there are some profound and important benefits to adopting zero trust. And while zero trust requires some radical changes to core ideas such as identity, Kubernetes users at least have a big leg up if they are able to adopt a service mesh and shift from purely perimeter-based network security to the “continual verification of each user, device, application, and transaction.”