Reconciling Kubernetes and PCI DSS for a Modern and Compliant Payment System

Ana Calin, systems engineer at Paybase, gave an experience report at QCon London [slides PDF] on how this end-to-end payments service provider managed to achieve PCI DSS level 1 compliance (the highest) with 50+ Node.js microservices running on Google Kubernetes Engine (GKE), and using Terraform for infrastructure provisioning and Helm for service deployment. Besides pinpointing and addressing some Kubernetes security shortcomings, another crucial factor was to challenge the "status quo" of PCI DSS requirements.

PCI DSS is a standard for information security dating back to 2004 that organizations dealing with credit card payments must abide to. PCI DSS requirements [PDF] do not yet explicitly consider the impact of Kubernetes or container orchestration in general. Many of them revolve around the concept of a "server machine" as the base computation unit and how servers should be secured and inter-connected. Virtual machines in the cloud could be mapped rather directly to this "server machine" concept. But when talking about containers, pods and their transient nature, interpretation of the same requirements becomes fuzzy.

Calin noted how many large financial organizations prefer to pay PCI DSS recurrent fines, rather than invest in modernizing legacy applications to comply with a strict interpretation of PCI DSS requirements. Overall in 2017, more than 80% of organizations are still failing to comply with PCI DSS, Calin claimed.

Calin mentioned the example of a PCI DSS requirement (#2.2.1) to have each server or virtual machine perform only one primary function. This might seem straightforward to achieve if we equate a "server" here with a Kubernetes pod. But if the interpretation is that a Kubernetes node is a "server," then this requirement would be clearly violated since a single node might be running an arbitrary number of pods performing several functions simultaneously.

This ambiguity is compounded if the PCI DSS compliance auditor does not have a strong technical understanding of Kubernetes or GKE, as was the case for Paybase according to Calin. It should not be the payment provider's responsibility to educate the auditors, she suggested. By persevering that a "server" in PCI DSS parlance is really a deployable unit (a pod) in Kubernetes, and that Kubernetes networking and pod security policies can enforce isolation between services, Calin and her colleagues were able to demonstrate compliance.

Earlier, Calin and her colleagues had taken a deep dive into Kubernetes and GKE security, including an internal infrastructure penetration test, to ensure the system would be robust enough to cope with PCI DSS regulation goals, even if the requirements as written are not Kubernetes-aware. Their research and experiments led them to define a set of guidelines for a secure GKE Kubernetes cluster, such as each cluster having a dedicated service account with only strictly required permissions and minimal compute engine scopes. Setting up Kubernetes network policies and pod security policies as well as enabling role-based access control (RBAC) are also critical, according to Calin. And finally, always scanning container images that are going to be deployed for vulnerabilities (a general security guideline for containers).

One of their findings around GKE security was that it came with some insecure defaults, such as read-write compute engine scope for new clusters, meaning that other Google Cloud services could modify the cluster. Also, the associated default service account had excessive permissions thus violating the principle of least privilege. Finally, legacy metadata endpoints were enabled by default, meaning that it was possible to obtain Kubernetes API secrets from any pod in a GKE cluster.

Other security weaknesses in their stack included the fact that Tiller, the Helm server which has read-write permissions on all resources in a cluster, did not perform authentication by default and also came with mTLS disabled. Gaining access to Tiller with default settings would allow an attacker essentially full access to a cluster, Calin explained.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter