InfoQ Homepage Presentations User Adaptive Security

User Adaptive Security

View Presentation

Speed:

19:46

Summary

Christina Camilleri and Jesse Kriss discuss how Netflix has readjusted their investments around user-focused security tooling, and explore strategies towards a tiered access approach within endpoint security.

Bio

Christina Camilleri is a security engineer on the Studio & Corporate Security team at Netflix. Jesse Kriss works on Stethoscope and other user-focused security initiatives at Netflix.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Camilleri: Welcome to our talk on user adaptive security, where we'll be exploring how Netflix is adapting to a changing landscape both inside the security industry and as a result of the chaos and variability introduced by 2020. In this session, we'll talk about how we've had to readjust our bets and investments around user focused security tooling, and we'll explore some new strategies towards a tiered access approach within endpoint security.

Introduction

I'm Christina. I work on the studio and corporate security team at Netflix, where I focus on shaping our endpoint security strategy. I've been at Netflix just shy of one year. Prior to Netflix, I spent a lot of my career in pen testing and security education. The area I've enjoyed the most at Netflix is our heavy focus on the positive user experience when applying security controls, which is quite different to the pen testing universe. This area is something we'll get into a lot in this presentation.

Kriss: I'm Jesse. I work on the enterprise security team. I've been working on various flavors of user focused security tools here at Netflix for a little over four years. This is actually my first job in security. Prior to Netflix, I did software design and development at NASA JPL, the Obama 2012 tech team, Figure 53, and IBM Research. My educational background is in music and human-computer interaction. I think that computers quite frequently increase the level of suffering in the world, and most of our systems and approaches are actively user hostile. I do my best to push in the other direction. Together, Christina and I work on the intersection between endpoint security and user experience. This is both a really exciting area and one that has plenty of room for improvement.

2020 Is Harnessing Chaos

Camilleri: You may have noticed that there's a lot happening in 2020. I think it'll be an understatement to say that perhaps a few things have not gone exactly according to plan. There's a lot on our mind between adhering to our culture pillars, adapting to the pandemic, the unique challenges of working from home, all while trying to figure out how to do endpoint security here at Netflix.

Kriss: Netflix did pioneer chaos engineering. This is not what we meant. There are a huge number of companies figuring out how to make the shift to a predominantly or completely remote workforce. We've all been dealing with remote onboarding, hardware procurement, and figuring out what security controls are still effective when virtually nobody is in the office anymore. At Netflix, we've largely experienced the constraints of the pandemic as an accelerant, not a major shift in position. We've been making choices and investments in the right direction for a while, but the current conditions meant we've had to drive faster towards certain types of changes.

What Is Stethoscope?

We started our Stethoscope project, nearly five years ago, bringing a user focused approach and lighter weight tooling to endpoint security. We've talked about this category as user focused security, which to us means two things. First, we focus on the user as the key point of intersection for security. It's ultimately people who have devices, access services, handle credentials, and take actions. Second, my analogy to user centered design, we approach the design of these systems knowing that if it doesn't work for people in practice, it really won't work at all. We look at the user experience of security as a primary concern, not as an afterthought. Stethoscope has gone through a few incarnations. First, there was a website that showed people information about their devices, as gathered by our endpoint management tooling, and gave them instructions for improving their security configuration. Next, was a desktop application that performed the checks itself, integrated into our single sign-on flow for device reporting, without relying on the presence of endpoint management tooling. Now it's a browser extension and a small helper executable that together can do checks and reporting independent of other systems or flows. Beyond device security, there are other trends in how we've approached various parameters and controls. We've been driving a location independent security approach for a number of years, and moving towards identity and access controls that let us run services on the public internet, allowing access without VPN from any network.

Netflix's Culture

These trends are new, of course, and they aren't unique to Netflix. User focused security is an approach that is now championed by companies like Kolide and others. BeyondCorp started a new trend away from trusted networks and traditional VPNs. Zero trust architecture now has a proper NIST doc building on the direction started by Google over 10 years ago. What makes this different at Netflix? There are two things that have made Netflix different from a lot of large enterprises. We highly value employee freedom and individual responsibility. We've chosen to accept a relatively high amount of risk in order to minimize rules and complexity and maximize organizational focus and agility. These are the principles that among other things, led us down a path of a relatively lax stance towards device provisioning and endpoint security.

We Looked Inward and Made a Few Changes

Camilleri: It's 2020 now, and a few things have changed in the last few years. Here's how we're refining and adapting our approach to better suit the current environment with endpoint security management, while respecting the culture pillars that Jesse mentioned earlier. With our changing landscape, we needed to address a few things. Our historical bets on focusing on transparency and freedom and responsibility didn't really align well with a typical systems management approach. We felt that we didn't want a system where we had invisible software with admin rights pushing up centrally managed policies. We adopted a model where we could check machine configuration at access time, give people clear guidance on how to make simple changes themselves, and not rely on strict inventory or a trusted bootstrapping process. Stethoscope gave us that ability, but some things have since changed, which made us revisit those bets.

Device Freedom at Netflix

One bet that we made with Stethoscope is that it should show not enforce what users should do, and nudge them in the right direction, like a buffer for changes to come. However, we found that this resulted in low adoption, so we needed a stronger assertion to help the user along and not rely on them too much. We wanted to focus on making the experience easier for the user. It's also worth noting that we are a heavy BYOD shop with a strong 12,000 users and 15,400 devices. Not typically in the sense that we're majority, bring your own device, but in that we treat our fleet of machines as customer machines. It should feel like your device, not a Netflix device. With that, it was particularly important as we still wanted a way to have confidence around fleet visibility, inventory control, and security while also not strictly controlling the inventory of devices that can connect to internal Netflix systems. For example, it's totally fine for a Netflix employee to buy a laptop at the Apple Store and use it for work. We also have significant number of vendors, contractors, and other third parties who need to access our systems and we don't want to enforce control on all of that hardware, even if we could. We aim to keep that freedom in devices as we highly value operational simplicity even at the expense of other things, which echoes something that Jesse once mentioned, in that responsible people thrive on freedom and are worthy of freedom. We wanted to avoid growing into a large company that felt like a large managed facility, because we strive to keep the user experience at the front of our minds. Operational costs are real costs too, and we didn't want to create unnecessary process fatigue.

What Changed?

Let's go over a few other things that made us adapt our approach. One is that we now have a larger distributed workforce as our company grew quickly, turned into a production company. Meaning, our company is more global now, there's more endpoints, more external employees, more corporate information, which ended up in way more responsibility for our workforce to keep their devices in a healthy and secure state. Note that just this change to a production company isn't a recent change, but it is recent in that you can think of Netflix as a predominantly production company with an engineering offshoot instead of a predominantly streaming company. With that also came a larger diversity of employees and locations. Globally, we're engaging with more external VFX houses, post-production studios, external talent, which leads to an ever expanding device landscape. There's greater need for personal machines to be used on set, mobile devices to access Netflix resources, and devices are constantly being onboarded and offboarded. There's also third-party machines that we don't get much data on, for example, external studios that work with Netflix.

Trust Tiers - A Netflix Tale

With that all in mind, how do we then tackle endpoint security in a land of minimal visibility and control? We introduced a tiered endpoint model consisting of four tiers, one where the user gets to control the access that they want with some tradeoffs. Our four tiers are, ready for anything, the highest tier, daily driver, casual, and zero. With each of the tiers comes specific conditions regarding the type of device, identifiability of that individual device, and the state of security settings and installed tools. Eventually, we want to use this concept and approach to inform our upcoming work in adaptive authentication and authorization. We imagine that these tiers to start working with coarsely-grained actions, for example, blocking access to a particular web app until you meet certain requirements. Then, eventually, support fine-grained actions, for example, a high impact or high risk action within a given application or API.

To go a bit further and comparing to BeyondCorp's access model, a public application where very few or no sensitive information is stored or processed can be mapped to a low trust tier, hence a requirement to obtain the trust tier is less stringent. Whereas a system handling highly sensitive information must be accessed by a user who has attained higher trust tier. Attaining that will require a more stringent validation process. In addition to the existence of these tiers, we're introducing an up to up, and up-down model.

Ready for Anything Trust Tier

Our highest tier is what we call ready for anything or RFA. It's a tier a user must be in when attempting to access a high impact or a high risk application, which we currently define as an application that holds resources we consider highly sensitive, or applications that can take highly sensitive actions. For example, mint access to a highly sensitive app, or highly critical app. Or, when the user themselves is deemed as a high trust or high risk user. For example, a user that has really sensitive information on disk. A user in this tier will need to have a machine with endpoint management tooling on it to give us a better ability to see and manage that endpoint. In addition, they will need to have MFA enabled with strong factors only, like a physical YubiKey, originate from a low risk IP address, and have an up-to-date browser. Basically, that they need to come from a trusted up-to-date device that we have strong confidence in and know that it belongs to a Netflix employee. This tier won't apply to mobile devices currently due to some limitations in the lack of evidence around the existence in an MDM at the browser level. That is, we don't know a device with serial number x has MDM, but when the browser on that device connects to a web application, we don't know that it is device serial number x, and we don't know if it has MDM. We don't have the right assertions right now to make the browser reach a higher trust level.

Daily Driver Trust Tier

The tier under RFA is daily driver, which is a little bit more relaxed. At this tier, the user will still need to pass device health checks initiated by Stethoscope before reaching certain applications. This tier caters more towards partners or vendors tying back to Netflix now having a larger production workforce, as well as a majority of the workforce that may not need to access high risk applications. One caveat to this tier is that high trust users will not be offered to opt down to this tier because we still want the additional security, visibility, and controls on that device.

Casual, and Zero Trust Tiers

The tier below that is casual, which is required for low sensitivity applications or lower sensitivity actions a user can take in an application. It applies to all mobile browsers regardless of MDM presence, and non-Chrome browsers on all platforms. It doesn't include assertions about device security configuration, but it is associated with the current user, which provides us some measure of protection and visibility. The lowest trust tier is zero. It's a tier with no assertions whatsoever. An example use of this tier would be for services that are used to bootstrap trust or show error messages, like error messages in Stethoscope work.

Multiple Devices

One other major reason we believe in having this tiered model, in addition to the user experience reasons, is that we believe that the majority of the workforce will have more than one device. Using myself as an example, I have my primary work device, which is my Netflix issued MacBook. My secondary being my personal Windows desktop. My third being my personal Pixel 4. I still like to have that physical separation between my work and personal devices, but still have the ability to take meetings or check email or Slack on my personal desktop or mobile. I don't need to have access to sensitive apps or processes on any of those personal devices, so I only need to be concerned with getting my MacBook into that secure state we were talking about earlier. This still leaves me with the ability to do some work on my other devices, and especially now that a lot of us are working from home, this makes my life a lot easier.

What a Security Control Should Be

Tying this all back to our talk heading, this is really all about adapting security to the user. Instead of offering users into one strict tier of requiring management on all devices and creating that unpleasant user experience, we're allowing the user to choose the tier that they wish to be in with some caveats. If a user doesn't want certain tooling like system management, they may opt down and select a lower target trust tier for the endpoint device. This will reduce the level of access for the client and the end user can decide for themselves if that tradeoff is worthwhile. With our endpoint management goals as well, we want it to be an almost invisible addition to the user's machines. We're not trying to create an Eye of Sauron experience, nor do we want to overload a user's machine with unnecessary heavy tooling, like excessive endpoint detection in response processes or any virus software on their machines.

With all the information we're able to gather, we also want to make it transparent and known to our users, and also let them know what that information has been useful. If a user opts in to having management tooling, we want it to be a positive addition to the machine too. For example, having a custom machine ready to go out of the box, like an animation workstation with all the necessary software for someone that works in animation. We'll also develop workflows for opting up so that the currently deployed hardware and bring your own device hardware can meet those higher tiers, regardless of the purchase owner, or purchase process. This will include investigating the feasibility of using virtual workstations or on-demand ready for anything clients.

Looking to the Future

Kriss: With these tiers defined, we have a lot of really interesting work in front of us. This is probably the most obvious one, we'll be able to set minimum required tiers for given applications or actions. We're working right now to adapt our authentication and authorization infrastructure to take these concepts into account. One thing we really value is end user recoverability. If somebody is blocked because their device doesn't meet the required tier, we'll be very clear about the reason and guide them towards a fix. Showing people an opaque error and telling them to contact the help desk was never really an acceptable path. It's especially bad now that so many people are working from home. Really, we want to be in a situation where the highest trust tier is actually the easiest path with zero effort. Our plan is for new machines provided to employees to meet the ready for anything trust tier by default. Our goal is to achieve this with Apple's device enrollment program, and other OEM compatible approaches so that we can ship devices straight from our hardware providers, and have them work with our tiers after the first login without requiring any additional end user configuration.

For people who don't have a high trust tier device already, we think virtual machines are a really interesting possibility. With the right infrastructure and controls, we could have one click access to a high trust machine from nearly any hardware. We think this is likely the best user experience for somebody whose everyday work only requires a medium trust tier, but suddenly needs to perform a more sensitive action, or for partners or vendors whose hardware isn't managed by Netflix. Instead of requiring them to install our endpoint management tooling, or wait for a new managed device from Netflix, they can get a temporary virtual environment and be on their way.

Wrap-up

Camilleri: A lot of this is still in progress and where we're finding our approach as we go. I think a lot of us are still trying to figure out how to adapt to the weird constraints the universe is throwing at us, and are probably facing similar problems. We'd love to hear feedback on our approach, or how you or your company is thinking about user focused endpoint security as well.

See more presentations with transcripts

Recorded at:

Jul 02, 2021

InfoQ Software Architects' Newsletter