InfoQ Homepage Articles Diving into Zero Trust Security

Diving into Zero Trust Security

Jun 23, 2022 18 min read

InfoQ Article Contest

Share your knowledge Win a ticket to a QCon event
or an InfoQ Dev SummitFind out more

Key Takeaways

Zero Trust Architecture works by eliminating trust for devices in networks where all devices inside the network perimeter have access by default.
Trust no one, inspect everyone.
Deny All access by default, make policies aligned with the least privilege access principle.
Trust is earned and it’s temporary, so periodically re-authenticate your users through centralized CoA (Change of Authorization).
Take note of government compliance, and understand that Zero Trust is a journey with multiple network-wide components. There is no one box for implementing Zero Trust.

In 2020, hackers made about 4.2 billion dollars from phishing scams.

The current scenario of network security highly depends on the assumption that if a client has a set of "good" credentials, they can be trusted with access to all or at least some confidential resources of the network.

Back to reality—with exponential data usage today, there is a definite increase in the degree of data breach in an organization. So, with the conventional checks, any "authenticated" client, making a connection from "outside" or "inside" can access this data and possibly exploit it—most of the time, unknowingly.

Having just a single security layer solution like VPNs or 1st-gen firewalls, but still relying on the good old dictionary credentials for SSH is evidently not good enough.

The Zero Trust approach involves a combination of more-secure authentication approaches, such as MFA with profiling and posturing of the client device, along with some stronger encryption checks.

Only after a complete holistic verification of the entity, "thou shall pass!"

So, how does it do a better job? How scalable is it? And why trust the "Zero Trust"?

This article shares some insights on Zero Trust Security for your organization and your customers, and how you can get started with it. It’s based on our presentation "Trust me, I'm an insider" - Diving into Zero Trust Security that we gave at QCon Plus in November, 2021.

Are You Guilty?

We want you to take a moment and think about the last time you changed your credentials without a warning from your company, or thought about what encryption actually goes into the web application you've been designing. If you have changed things recently, have you thought of revoking your previous access for resources? It doesn't mean anyone is a bad employee for being guilty of any of these things. It’s one of the many glitches that we have in our organizations. We just don't do user-behavior analysis properly. We say that because 34% of data-leakage cases come from inside an organization, and from the outside, 90% attacks come from phishing emails, which very often trigger ransomware attacks. Last year, it was roughly estimated that a ransomware attack took place every 11 seconds.

Unfortunately, we’ve been seeing more of these attacks recently. See the supply chain attack on SolarWinds or the Colonial Pipeline attack in the U.S. Also, REvil attacked one of the on-prem VSA servers of Kaseya by using sideloading, which means the actual malware was run under the pretense of a genuine anti-malware application of Windows. One of the reasons this attack went unnoticed was that these anti-malwares were probably never upgraded. Unfortunately, many businesses and critical services like hospitals and nuclear power plants also don’t upgrade their systems or improve their security features for decades, because they cannot afford any downtime. For example, about 92 ransomware attacks took place every day in the healthcare sector during the pandemic. This affected 600 different organizations, losses of $21 billion were incurred, and about 18-million patient records were compromised and leaked.

Ransomware Has Evolved

It's astonishing to see how ransomware has evolved into something called the "blackmail ware" model, and with this, they don't want to hurt their media reputation. Most likely, attackers are going to give back your data, but that doesn't mean that they don't already have a copy of it ready to sell on the black market or dark web. As we saw during the pandemic, due to the need of the hour, they knew that hospitals so critically needed the services back as soon as possible, they would pay the ransom. Just to paint a pretty picture, imagine you're working in a power plant that is fully automated. One day, the whole grid goes on lockdown, and you're nowhere getting out of it without the attacker's mercy. The plant had the code. It had automation. It had lock-ins. Still, why? Probably, because of decade-old IoT networks, software-upgrade deadlines that were missed, the people, and ignorance from management and other reasons. Now that's a Mr. Robot episode, we would not want to be part of.

The Need for Zero Trust

What we’re getting at is, that despite companies spending so many resources on development and security, these data breach news stories are always out there.

We have seen data-leakage cases of renowned multinational corporations (MNCs) with one of the biggest IT infrastructures. When their services go down, that acts as a very good opportunity for attackers to create new backdoors for web applications or data centers of your social-media accounts, aAnd it’s very costly. Companies need to pay a lot of money, not only to save their businesses, but in litigation, government disclosures, and other legal troubles.

Also, the way we’re working has changed drastically. You're now able to log in and work from your iPads and phones. So, the current security scenario just cannot keep up with it. The same rule checks you run for your workstation or on-prem devices won't work for your personal devices. But, what would work are the exploits.

Finally, it’s important to understand that just because one connects to a secure organization, the endpoint—you, the user, and your device—doesn't itself become secure. It’s still as vulnerable as it was before, and poses a threat, not only to itself, but to the lateral movement of traffic inside the organization, and far more quickly than you can save it.

So, protecting customer data is no longer our only requirement; the organization's data is now equally vulnerable. Hence, what Zero Trust basically talks about is building trust with resources and devices equally—inside and outside your network. That is, by not trusting either of them.

The Multiplayer Surprise

What does Zero Trust bring to the table that you already don't know? Think of it like this: the current approach is based on static and network-based perimeters like network location or the IP addresses.

But because this introduces attacks, we now focus on the user's assets and resources, instead of the network itself, which makes more sense because the originator of traffic and exploit will be among these. Also, think about the implicit-deny rule as a default access-list about which you might have heard. That's the approach.

Just because your username is admin, you shouldn't be given full privilege. Privilege-escalation attacks are often exploited this way. Rules should be built around denying access by default, rather than allowing even some level of access. Also, the linear approach of security revolves around threat by nature—how to recover from it, then decide how to mitigate it. However, the approach should be to limit those chances rather than taking them.

Of course, it doesn't mean that threat hunting is any less fun in Zero Trust; it rather takes threat hunting more rigorously, and can be modeled to act in real time, using orchestration, studying Indicators Of Compromise (IOCs), and more.

Zero Trust eXtended (ZTX) Framework

Zero Trust is not a new concept, honestly. It began as a discussion in about 2004 in Jericho Forum. Why are we talking about it now? The problem was that even knowing the advantages of Zero Trust, it wasn’t easy to implement it, because the networks changed a lot through the years.

Now, we’re at a point where we can accept the fact that just like an IT perimeter, our networks are also changing. That means we can't run the same decade-old security structure. We're also able to implement complex scenarios and network designs easily, thanks to the available resources in technology and advancements, as well as brilliant engineers, architects, and network admins. Forrester, a leader in market research and consulting, then changed their original design a bit in 2017, and said that data is the center of the universe, so we should focus on how to manage it, classify it, categorize it, and most-importantly, protect it. This is the Zero Trust eXtended Framework widely used today.

The ZTX Framework has about seven pillars that, based on Forrester’s audits, determine who in the industry is doing a better job at implementing Zero Trust principles—for customers and themselves. Based on these criteria, we have some leaders and strong performers in this field.

When you want to implement the ZTX Framework, in addition to selecting vendors from which you can select your products and services, you can answer the following questions to give you an idea how to implement Zero Trust:

What are the firewall rules you're using?
How are you encrypting your data?
How are you controlling it?
How are people in your organization actually understanding security internally?
How are you identifying devices, which is one of the main concerns, and Identity and Access Management (IAM ) complete picture there?

NIST 800-207 Zero Trust Architecture

To understand Zero Trust Architecture, you should look at the ideal model for Zero Trust, developed by the National Institute of Standards and Technology.

In this model, the policy decision point is the brain of the operation. It consists of these key elements:

Policy Engine, which is responsible for the ultimate decision to grant, deny, or revoke access to the resource if required.
Policy Administrator, which acts as a middleman between the control plane and the data plane. It commands whatever action it receives from the policy engine to the next element.
Policy Enforcement Point (PEP), which is part of the data plane. This is what the requester is going to interact with at all points, for example, your firewall, a login page to a web application, etc.

For this Policy Decision Point (PDP) to work, the architect or the admin should feed some external constraints and rules as per their network. For example, should the devices be at a particular industrial standard, or if you're using certificates, what should your Certificate Authority (CA) be? or How are you defining your Access Policies? . Now, a lot of companies today prefer building their own models and architectures of Zero Trust, so you might find an additional trust engine in other models, but they're mostly based on this architecture.

What a Cybersecurity "Mesh"

One of the variations of Zero Trust is microsegmentation. As the name suggests, it segments your network to install multiple policy-enforcement points for your smaller perimeters, which shapes the cybersecurity mesh. It shifts the focus from implementing PEPs at the perimeter, to a custom-based, identity-based verification approach for smaller segments, while still keeping orchestration central, so any threat remediation can happen simultaneously on all nodes. This approach helps minimize the lateral movement of any attack vectors we discussed. Now, your IT team can create smaller perimeters based on certain aspects. For example, one layer can be for remote users working on application sandboxing. Other layers can be for deploying on-premise workflow. This gives less scope for cyber criminals and hackers to exploit an entire network, again— less lateral movement of attacks.

Get Started With ZTX or Cybersecurity Mesh

Be it ZTX or cybersecurity mesh, the goal for our approaches is the same. Protect the resources and requesters from harming each other and the entire network.

To make that decision on PDP, and implement your PEPs, one should know majorly what type of assets are in the deployment, because rules for company assets will differ from the rules for personal devices or IoT devices that belong to the network infrastructure.

Additionally, the packet flow visibility is very important, because this defines how we can continuously maintain the trust we've been defining. Intrusion-prevention systems integration on PEP, like a next-generation firewall over the network is something one should consider. Also, never let the requester interact with a resource directly. Obviously, there are many other approaches and network requirements to consider which will change based on type of devices available, importance of the resources, the scalability of your deployment, and the cost that can be allocated for this network, and other factors.

Remote User Use Case

As an example of the concepts we’ve discussed, let’s look at a use case of the work from home scenario. To implement that with Zero Trust, what do you need?

A head end that is smart, where the users can terminate their sessions. This can be your next generation firewall or existing firewall with some IPS capabilities.
Incorporate encryption in your businesses. This assumes that you're always on a public network. . That can be using VPN, which, unfortunately, a lot of us think is a replacement for Zero Trust, but that is not at all true. Zero Trust doesn't talk about security principles of confidentiality, integrity, and availability for data by default. It does have values, but we need some tool to implement the same. So, assuming that you're always on a public network, works.
A good policy engine. You should choose one to help with authenticating the users. Again, just username and passwords will not work. Your beloved Admin123 password can be cracked in less than a second. We definitely need more than that.
Multi-factor authentication is the next step, along with certificate-based authentication and EAP-TLS. Again, authentication by itself is not enough. You need a dynamic policy as you might have noticed in the architecture, so static rules are no good in Zero Trust.
Logging decisions are often missed, but very important when you want to go back and find out what was done. It can be done through Syslog servers, your monitoring tools. A major pillar of Zero Trust is to continuously monitor the traffic in the network. It's a cherry on the cake if you can have that threat defense tool integrated at all possible points of interceptions of potential attacks.

Notice that we didn't create a numbered set of steps for these ingredients— that’s because the beauty of ZTX, or Zero Trust is you don't have to start at a single point. You can first choose your IAM (Identity and Access Management) server, for example, make rules there, and then integrate the other sections of the framework. Similarly, you can begin with your threat-orchestration tool or logging method, and then move to your head-end implementation. This stands true for a remote user or on-premise user application or anywhere else.

AAA (What is A, A, and A?)

Let's explore IAM or Identity and Access Management. Starting with AAA or Authentication, Authorization, and Accounting, this is the technology behind IAM. To quickly recap:

Authentication means who is requesting access.
Authorization is what permission we have, based on who is accessing the resource.
Accounting largely helps you update sessions and monitor the resources a user consumes.

The Trust Algorithm

In any enterprise with Zero Trust implemented, we have the trust algorithm as the primary thought process behind granting or denying access for a user. It is broadly categorized into five groups, shown in the image below:

The first is Access Request, which is a request from a supplicant down to the IAM server. Subject and Asset Database contain information on users, their context, and their assets. Resource Policy Requirement is the minimum policy requirement you have to meet before you can gain network access.These requirements are generally put in place by network administrators or data custodians who understand the impact of assigning incorrect policy to an incorrect user. Threat Intelligence is the last one, which is concerned with looking for any malicious activity in your live network.

How to Quantify the Trust

So far, we’ve discussed the term "Trust" a lot, but for an IAM server to understand this terminology, the meaning of the word trust has to be quantified. The way we do the quantification is by thinking in terms of assigning brownie points for meeting each predefined criteria. For example, imagine yourself at a bar entrance, and say you have to meet at least four out of five guidelines before you can enter the bar. This is similar to how profiling works. IAM server tries to profile a device based on a similar scoring system. In this example, we assume that 20 points is what is needed for a PC to be profiled as a valid Windows 10 workstation.

Probes Aren't Magic

Let's look in detail at what this scoring system looks like. Here we are using probes, which are just protocols used to collect as much info on the endpoint as possible.

Our journey starts from zero points for a PC because the IAM server knows nothing about it.
Using RADIUS protocol, RADIUS access request messages sent from PC to IAM server.
An IAM server will look into the RADIUS packet attribute and identify that it has the OUI that matches with the OUI field of Microsoft Windows. There we have a little bit of idea that this might be a Microsoft device.
If configured with DHCP, we can have the endpoint send a copy of DHCP request down to the IAM server, and the IAM server will look into the DHCP class identifier field and say that this definitely is a Microsoft device.
To dig deep, we have a third probe. If the endpoint tries to browse to the internet, the IAM server will read off the user agent field from the HTTP packet and see that it's actually a Windows device not just a generic Microsoft device.
Lastly, we have AD probes and Nmap probes working together, which will work similarly to the previous probes. They will be able to dig deeper than the last one, and understand this is a Windows 10 PC, and it must now acquire 20 points as we decided.

This is how profiling works for any endpoint in a Zero Trust environment.

Trust Is Earned, and It's Temporary

Change of Authorization (CoA), is one of the most crucial elements of a Zero Trust Architecture. For example, imagine someone comes into the office and connects their company laptop to the wired network, and everything is good. Ten minutes later, they decide to turn off their antivirus because they want to download a game off the internet. Does their trust profile change after this particular action for the IAM server or does it remain the same? It must change because they just shut a crucial antivirus application from their PC. In the worst case, they are now a potential threat to devices on the same network if their device is compromised. In such cases, the IAM server re-authenticates my PC with a restricted-access policy.

What Is Posture? Why in Zero Trust?

Moving into posture. Again, after CoA, it's another central element of the Zero Trust architecture. One question that posture helps us answer: "is the endpoint compliant with the company's security policy or not?" The way we answer that is via some condition checks.

Here in the example we have antivirus check, latest-patch check, and even the USB-plugged-in check to see if the device that is onboarding to the network has any USB device plugged in, because that's not a good sign. That shows that maybe you're trying to introduce a rogue device into the network. You can create your policies accordingly and say that any such devices which don’t meet these condition checks, should not be granted access, or if they are granted access, that should be a very limited access because they will be marked as non-compliant after posture check completes.

Native Supplicant — Cisco AnyConnect

Let's have a quick walkthrough of how we provide Zero Trust for a workplace. We suggest Cisco's way of implementing an IAM solution, illustrated in the image below:

On the left, we have users coming in via different mediums trying to gain access to network resources. Their access request has been sent to an IAM server, which then tries to authenticate the users with the help of SAML providers, or it could be LDAP servers, or a bunch of Active Directory servers on the backend. The authentication protocols in most cases would be PEAP-MSCHAPv2 or EAP-TLS for the user or a machine authentication.

Threat Detection and Incident Response

We now move over to the part, where how you manage trust and how you handle threats come together. One of the key things that threat detection and incident response focuses on is the continuous evaluation of the traffic flow in a network. This idea of continuous evaluation stems from the fact that in the security world of today, it's always good to be suspicious. You can achieve that in many ways. You can integrate your threat detection tools with your IAM server, or you can have your policy enforcement points like firewalls, or have cloud-based services on lease, or you can have a combination of them. The idea is to make it as robust as you can.

Government Compliance

A little about government compliance and Zero Trust. We’ve included the white paper (Executive Order on Improving the Nation’s Cybersecurity) so you have enough time to go through it later on, and understand how governments around the world have not just been adopting but also mandating the implementation of Zero Trust.

Some Hiccups

Let's be honest, it's a lot to take in and we understand it. It's not so simple. There are definitely some hiccups along this way of implementing Zero Trust. Many of the tenets that we have discussed have merits, but at the same time, some might be extremely complex because of a few reasons. The most notable reason, which knits these all together, is that many organizations today have technical debt, meaning they are running applications more than a few years old, and they have been building their own software for consumption around the same infrastructure. As a result, redesigning and redeploying your architecture, or shifting to an entirely new one for that matter, could be quite costly and service disruptive. Service disruption is bad for businesses, be it a major bank, military, university, or a hospital. Technical debt is a major factor here.

Summary

To summarize the big points, when we are following Zero Trust ideology, we want to:

Grill the endpoint on proving identity.
Periodically re-authenticate and follow the least-privilege access principles.
You need to understand your network to design a better Zero Trust architecture, because static rules don’t apply in this network.
There's no company that can sell you a box and say, "There you go, you can now have a completely Zero Trust architecture implemented now." That's just not possible. It's not the nature of the whole Zero Trust paradigm.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Diving into Zero Trust Security

InfoQ Article Contest

Key Takeaways

Related Sponsored Content

Are You Guilty?

Ransomware Has Evolved

The Need for Zero Trust

The Multiplayer Surprise

Zero Trust eXtended (ZTX) Framework

NIST 800-207 Zero Trust Architecture

What a Cybersecurity "Mesh"

Get Started With ZTX or Cybersecurity Mesh

Remote User Use Case

AAA (What is A, A, and A?)

The Trust Algorithm

How to Quantify the Trust

Probes Aren't Magic

Trust Is Earned, and It's Temporary

What Is Posture? Why in Zero Trust?

Native Supplicant — Cisco AnyConnect

Threat Detection and Incident Response

Government Compliance

Some Hiccups

Summary

About the Authors

Sindhuja Rao

Deepank Dixit

Rate this Article

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter