InfoQ Homepage Podcasts Open Policy Agent (OPA) with the Project’s Co-Creators

Open Policy Agent (OPA) with the Project’s Co-Creators

Bookmarks

Apr 26, 2021

The Open Policy Agent is used for policy decision-making across the stack. In the case of Kubernetes, it is often used as an admission controller to protect the API Server with dynamic rules that don’t require recompilation to introduce. Today on the InfoQ Podcast, Wes Reisz speaks with Tim Hinrichs and Torin Sandall (two of the Open Policy Agent Project creators). The three talk about the project, including things like architecture, origin, community, the policy language (Rego), and, of course, performance. The podcast is an introduction to how OPA can is used across the stack for policy decisioning

Key Takeaways

OPA's responsibility is to make a policy decision on its own and return that decision as a JSON object back to the caller. It's up to the caller to decide what to do with the OPA decision. Semantically, OPA only operates on the data passed to it (typically as JSON). So OPA doesn't require deep knowledge about the environment itself. This makes OPA flexible and portable to many different use cases.
Rego is a high-level declarative language that's based on decades of research into policy systems. It embodies specific ideas that make it useful for these kinds of more modern cloud-native systems and is designed like a onion. There are core parts of the language that are extremely fast. As you need more expressiveness, you move up the performance curve.
OPA is most often used as an admission controller in Kubernetes. An admission controller is where all the semantic validation of Kubernetes resources occur before resources are persisted to etcd and controllers go off and start doing work.

Subscribe on:

Transcript

Wes Reisz: Open Policy Agent, OPA, is a policy engine. It allows you to decouple policymaking decisions from policy enforcement across your entire application stack. Or put more directly, it allows you to write code pulling from things like pod definitions in, say, Kubernetes and write logic, think rules that you can then pass those decisions back to an upstream, to an app to actually act on. That's the decoupling decisioning reinforcement part.

Wes Reisz: While I'm personally most familiar with OPA in the context of Kubernetes, Open Policy Agent doesn't just stop there. OPA has agents available spanning the front end, the back end. From K8s to applications that might run in a cluster (things like Kafka, or maybe spring-based apps). It can plug into the service mesh; it can plug into API gateways (even CI/CD pipelines it can plug directly into). Whether running in K8s or even securing an SSH on a Linux box, OPA is a versatile policy engine with a large ecosystem that is designed for the cloud-native space, and we'll actually dive into that a bit in the podcast. It truly delivers on the promise of policy as code.

Wes Reisz: Today, we're speaking to two of the creators of OPA, Torin Sandall and Tim Hinrichs. Hello, and welcome to another edition of the InfoQ podcast. My name is Wes Reisz. I'm one of the co-hosts for the podcast. If you like what you hear on the podcast today, be sure to check out QCon Plus. QCon Plus a software conference that we run that's focused on senior developers and architects. This is an online edition of QCon, with the environment that we live in today. Again, still focused on senior developers, still focused on architects. This edition of the conference will be held over two weeks in May, May 17th through the 28th, and it's really built for an online audience. So it'll be an extended format where you can attend in just a couple hours a day (across those two weeks), rather than fully committing for a full part of your day like you might for an in-person conference.

Wes Reisz: It's going to feature 16 tracks with people like Tammy Bryant Butow, principal SRE from Gremlin, Randy Shoup, chief architect at eBay, Sergey Fedorov of Netflix, along with engineers from Red Hat, Google, and Financial Times in London, and many, many more. So check it out, if all this makes sense or it's of interest to you.

Wes Reisz: As I mentioned today, we're talking about Open Policy Agent with Torin Sandall and Tim Hinrichs. Torin is the VP of Open Source at Styra, and Tim is their CTO. Both are co-creators of the Open Policy Agent project. Today on the podcast, we're going to discuss OPA. We're going to talk about some of the problems that OPA has been solving, and why they came up with the idea to attack it like this. We're going to talk about Rego. That's those rules, the policy, the language things that they're actually able to control these decisions coming through, and how they attack this problem. We're going to talk about the ecosystem. We'll talk about K8s, we're going to talk about a lot of things, and we'll see where the whole conversation goes. We're going to dive into OPA over the next half hour.

Wes Reisz: As always, thank you for joining us on your walks, runs and commutes. Torin, Tim, welcome to the podcast.

Tim Hinrichs: Thanks Wes, it's great to be here.

Torin Sandall: Thanks Wes. Looking forward to the chat.

Wes Reisz: Prepping for this podcast I probably watched six, seven different presentations the two of you have done over the last couple of years. So my first question is, over the last two or three years, how many times have either one of you asked the question: "how many people in the audience have heard of OPA?"

Torin Sandall: I feel like there's a correct answer to this question.

Wes Reisz: Every single one I watched.

Torin Sandall: Seven?

Wes Reisz: I think I watched KubeCon from a couple of years ago and you could see more hands going up as you get closer. So it's gaining mainstream. It's good. Congratulations.

Tim Hinrichs: In the early days those questions were super important because we would be on the spot at some of those talks, deciding how much content we were going to go into, how deep it was going to be. And so it's been fun to see that over the years that at the beginning we had a deep dive session where like, pivot, we're going to go do an intro session because we could just see the audience. But obviously now we've got bonafide intro and deep dive session. So it's certainly fun.

Wes Reisz: Kind of start this off. Tell me what does Styra do, where are you all from?

Tim Hinrichs: We're all from Styra. We have really three things that we do day in and day out. The mission of the company is to provide a unified solution to authorization for the cloud native ecosystem. And there are three things we do. Obviously we start at OPA, donate it to the CNCF, and continue to maintain it and shepherd it. The second thing we do is that we offer support. So if anybody's using OPA and they need some help, we're happy to help there. And then the third thing we do is we have a commercial product designed to help operationalize OPA in an enterprise. So if you're trying to rollout OPA within an enterprise, across many teams, and you'd like some help with that, we're happy to help there too.

Do you get people surprised at how widely OPA can be used across the stack? [04:44]

Wes Reisz: When I set this podcast up and wanted to do this, I did it (and we'll talk about it here in a bit) because of the CNCF graduation, I wanted to hit on it. It was good timing for all that, but I was really only thinking about OPA from a Kubernetes context. I didn't realize that OPA could be used across the entire stack. Is that a common thing that you find when you're talking to people?

Torin Sandall: We definitely find that people pick up OPA for a particular use case. They have some concrete problem that they're trying to solve in their organization. And because Kubernetes is so hot and so many people are learning how to manage it right now, that's often one of the first places that they come in to OPA from. But, once they've used it to solve admission control, putting guard rails into Kubernetes, they pretty quickly realize how broadly the project can be applied and they start to venture out. And so they might venture out into other places in Kubernetes. We see people using it to do just plain old authorization. It's not in addition to admission control in Kubernetes, but they might go elsewhere. They might start using it to scan their configuration files during CI/CD. They might use it to enforce policies in SSH, at the host level. Or they might start to inject it into their APIs and to control access at runtime. So we typically see people start with one use case, maybe it's Kubernetes, but then they learn and they see how broadly it can be applied throughout their environment.

Wes Reisz: So double click into that a little bit. When we say it can be broadly applied across the application stack, I'm familiar with authentication, authorization, access, getting into things, but what does it mean to be able to apply OPA across the entire application stack? What does that actually mean?

Tim Hinrichs: Yeah. So we like to use that term, across the stack, because it's great for us as engineers, because that's how we think about building software. We think of a stack. You've got a CI/CD pipeline, that's part of the stack. But then you've got the cloud platform maybe you're using, maybe it's public cloud and maybe Kubernetes there on top of that. And maybe you're using a service mesh. Then you're running your applications and those applications are also using databases. And then the application itself has a gateway sitting at the front. So when we talk about across the stack, one of the things we've tried to do with OPA is make it possible and even easy to integrate OPA into really any kind of piece of software that people are using for building and running modern software systems, cloud native software. And so that's what we mean, the idea of applying OPA across the stack means that you can integrate it really wherever you want.

Tim Hinrichs: And that was one of the design criteria around OPA that, integrating it with other pieces of software, was incredibly important and making it able to work with any piece of software was also super important. So what we never did with OPA was say, we're going to write a bunch of go code that makes OPA know what a Kubernetes pod or an ingress is. We're not going to add a bunch of code that knows what an HTTP method or a path are. Instead we'll make it general purpose and agnostic. And so therefore we can apply it to all those different elements of the stack, things that we've never even seen or heard of before. That's kind of why it works. That's why we like to say it applies across the stack because it is easy to integrate. And moreover, it has been designed to be domain agnostic.

Wes Reisz: So why is it easy to integrate? What are some of the design decisions that make it easy to integrate across the stack?

Torin Sandall: I think that goes back to what Tim was saying, which is that OPA itself is not tied to any domain specific data model. So it's not tied to HTTP. It's not tied to Kubernetes. It's not even really tied to JSON or anything like that. Basically you can feed it arbitrary, structured data, typically JSON, and then it can evaluate that data against rules that have been written for it. So OPA itself is not tied to any of those technologies. All it sees essentially is structured data, JSON coming in as part of the query, structured data being evaluated by rules and then structured data being generated by those rules, representing policy decisions. And that being sent back for enforcement. OPA itself is completely decoupled. The rules you write bring meaning to that data. They understand that that data represents a Kubernetes pod, or it represents an incoming HTTP API request, but on its own OPA is completely decoupled.

Torin Sandall: And so, because it's so decoupled, it's really easy to plug it into different systems. In order to ask OPA for a policy decision or in order rather to extend and enhance a system to use OPA for policy decision-making, all you have to do is execute a single API call to say, give me the policy decision for this input data. And so the job of the integrator is really just to gather the data that's relevant, potentially relevant, to a policy decision and to supply that as input. That's something that anybody that's familiar with the system that they're integrating into will be really comfortable doing.

What different types of integrations are there for OPA? [09:01]

Wes Reisz: Yeah, totally. It makes sense. The ecosystem's pretty wide, Torin. I know there's a whole bunch of different integrations that are there. What are some of the more popular ones or just give me an idea of the breadth of the integrations that are out there?

Torin Sandall: We have integrations, obviously, for Kubernetes. We have integrations for service mesh projects like Envoy. We have integrations into different... That integrate with CI/CD systems. It really goes everywhere, even down to Linux Pam, for example. And so if you go to openpolicyagent.org, you can see tutorials for the top ones that we've seen on the page. So we think Kafka, Envoy, SSH, Terraform, Docker, and then we also have an ecosystem page on the website. And that ecosystem page, anybody can contribute to that. So if you go there you'll see, I don't know how many there are on there now, but basically anybody that builds an integration can basically add a little card to that page and they can link off to talks or blogs or videos about that integration. And that's a great place for people to go and discover what's available. Whether they're interested in API gateways or service mesh or orchestrators or databases, you name it, programming languages, that's the place to go to find an integration with OPA.

What is the architecture of OPA? [10:01]

Wes Reisz: So let's use this, I call it the white board of our mind because it's a podcast and we don't have a whiteboard, but let's draw this pictures out. So you've got this box that you feed JSON data or some different data to it, that there's this engine that's going to be able to do something. What are the boxes that are in this flow from, say, a web server that's going to allow some kind of access or something? What are the major boxes in this piece that I need to understand, to understand what OPA does?

Tim Hinrichs: That's a great question. So, the way we draw it is pretty much what you've already described, which is you have a service. It doesn't matter whether it's a gateway, doesn't matter whether it's a Kubernetes API server. It doesn't matter whether it's an SSH PAM module in Linux. That piece of software just decides at some point in time that it needs a policy decision, an authorization decision. The simplest version is then it sends an HTTP request to OPA that says, that's a post I think technically, a post we get. And then it says, as my payload, here's the JSON object. Give me a decision back.

Tim Hinrichs: Now, OPA, its responsibility is to be able to make that decision all on its own and then return that decision as another JSON object back to the caller. So the interface is fairly simple. It's an HTTP post to OPA to get an answer. And the answer that comes back as another JSON object, whether that's the Kubernetes API server, or it's the service mesh, or it's the PAM module, that response needs to go ahead and look at that JSON and enforce that decision. But nevertheless, it is OPA that is making the decision.

Wes Reisz: And then as an integrator, if I'm using Java, I might have intercepting filter that then looks at that and then either blocks or doesn't. I have the choice on how to actually integrate that into my application?

Tim Hinrichs: Yeah, exactly. So the response that comes back is generated by the policy and that response might just be a simple yes, no. Saying yes, allow this, continue as normal. Or maybe a no, block this, send an error back to client. Or it could be something more interesting than that. It could include an error message. It could include the reason why you're not being allowed to perform the operation. And so then it's up to that little enforcement module to do something with that. At the end of the day, that's the model.

How did the project start? [11:57]

Wes Reisz: Very nice. So I literally spent, I don't know, they were telling you before the podcast started, I probably spent a good five years of my life writing different things for SSO, that sat in the stage. What's the genesis of the project? What made you all think, you know what, let's abstract out this into something that is universal and we can use across the stack?

Tim Hinrichs: That goes back to the origin of the company Styra where we work. So in the early days the company started, there were a couple of us at VMware. We'd come in through an acquisition. And so anyway, we were talking to a number of financial clients at the time. And what they had told us was a story of, "Hey, we've got all these applications, all these pieces of software." And they built a unified solution to policy and authorization. But they were like, "No, we don't want to do this. This is not what we do. We move money around, we're a bank." And so they said, "Well, you know what? How about you go off and build this for us?" And so at some point then we went ahead and started Styra with this idea in mind that, as organizations embrace cloud native, they're going to see the same problem of policy and authorization. They're going to have to solve it across the stack.

Tim Hinrichs: Authorization's been around forever, but now everybody's embracing all this new technology. And so they're going to have to deal with authorization. And because of the cloud native world, it's so ephemeral, it's so dynamic, it's so oriented around microservice architectures, that we knew that the authorization problem is going to be far more compelling and challenging than it ever has been in the past. And so that was when OPA came into existence, just hearing this from end users and then executing on that.

Rego looks like part rules engine/part programming language. Where does it fit into that spectrum? [13:17]

Wes Reisz: Very nice. So let's talk about Rego. Actually, inside there, we're getting this data and you're making decisions, the rules that you're actually writing. When I saw this, it reminded me of other aspects of my programming career. It reminded me of drools, of rules engines that I might've been writing to do something. It reminded me of maybe even writing a DSL on top of that. Is Rego a rules engine DSL? Was it designed to be like that? Or is that just the by-product? How does it fit in this space?

Torin Sandall: You can think of Rego as a rules engine. What Rego is, is it's a high level declarative language that's based on decades of research into policy systems. It embodies certain ideas that make it really useful for these kinds of more modern cloud native systems. So it has first-class support for operating on deeply nested structured data, JSON, basically. So, yeah, so it basically gives you the primitives that you need to write down the logic that controls who can do what in a system. And so it's designed to allow you to take something that would normally be written down on a Wiki or in a PDF, that says thou shalt do X, Y, Z in order to ensure compliance. And it lets you basically put that into code. So it embodies this idea of treating policy as code, without going all the way to the extreme of a standard programming language, which is not something that non-technical folks can write.

Torin Sandall: It's not something that technical folks want to write when they're dealing with policy. Policies are these organic things that change all the time. And they really just need to be able to focus on the data that's in the system, that the decisions have to be made based on, and the logic. You don't want to be thinking about sockets and errors and data structures when you're thinking about policy. You want to just think about what is the logic that I need to express to ensure that I'm compliant? And so that's what Rego is designed to help you do.

How is today’s Cloud Native architecture different from the architectures from a few years ago? [14:59]

Wes Reisz: You just said something that I want to hit on a little bit and that's that cloud native. As I was working in the past, I was always doing authentication/authorization. It was always the ingress, the point that I was coming into an application. When you say this is designed for cloud native, how are applications designed in a different way today that we really need to have a tool like this that can plug in, like for east-west traffic, for example? How is it cloud native different than maybe the monolithic style of applications that was using authentication authorization before?

Tim Hinrichs: I'll reference Netflix in this talk. It was funny. I think the first major public talk of OPA was by Netflix way back in, I think it was Austin if I remember right. So that was numerous years ago. And the use cases they had was exactly this, it was controlling east-west traffic. They wanted the security team to be able to put certain policies in place. And they wanted all the developer teams to be able to control, at a lower level, that east-west traffic for all those applications. And the story that they told was the same one that we were telling but I'll tell it as if they told it, which was that, in this microservice world, one of the things that you realize is very different from the monolithic world is that, for any end user requests, you've got many more hops over the network from one microservice to the next, in order to assemble the end user's customer payload.

Tim Hinrichs: To answer the question the customer asks. And when you move to that microservice architecture, because you have all those additional network hops, what you don't want to do, from an authorization point of view, is have a separate service that each and every one of those hops over the network has to go and ask for a decision from, in order to get a decision. Because it'll double the latency, it'll double the number of network hops and, in so doing, it'll greatly reduce the overall availability of the service. And so what they wanted to do, which is what OPA was designed to do, is push all that authorization logic to the edge.

Tim Hinrichs: In this case, OPA running on each and every node in the service, so that however many microservices you have, you have just as many copies of OPA. Except all of those OPAs are running on the same server as each and every one of those microservices. And so by doing that, you simultaneously decouple policy from that underlying microservice. So you're not hard coding the policy into the service itself, but you're also getting roughly the same availability and performance as if you had left it in the code.

How do you handle performance concerns? [17:11]

Wes Reisz: Makes sense. So we started talking about Rego before I asked the cloud native question, let's come back to it for a second. When you talk about it being a declarative language, talk a little bit more about it. I have some experience where I had to write Lua, for example, and Nginx. And there were some constraints that I had, like no loops and things like that. How extensive, what kind of things do I have to live by with Rego? Are there any characteristics of the language that I need to be aware about? Like not making outside calls to services? What are some of the things that I need to think about when I'm writing Rego?

Torin Sandall: That's a good question. I mean, the language at its core is actually really simple. It basically just lets you write down a bunch of if then statements, effectively, that are just simple Boolean conditions basically. It's like if these conditions hold then assign this value as the decision. So if the method is get and the path is slash salary and the user is from HR, then allow is true. So that's really what the language does at a high level. It just lets you write down these if then statements over structured data. But, in order to do that, there's things you need. So you need to be able to obviously refer into that structured data. So one of the things that is at the core language is this ability to reference data and dot into it as you just naturally would want to do. Because we're dealing with collections of data in a Kubernetes pod, you might have a collection of containers that need to be analyzed in order to come up with a policy decision.

Torin Sandall: And so it gives you the ability to iterate and search across that data. And because you're dealing with all kinds of different data, you might need special functions to operate on it. So for example, if you're working with web APIs, you might be ingesting JSON web tokens in order to make the policy decisions. So we've got functions that help you verify signatures to code the token and extract the claims and all that kind of stuff. So the language comes with a bunch of primitives that basically make it easier to write these kinds of policies. And that OPA itself implements a bunch of performance optimizations to ensure that policies execute quickly and efficiently. And then it provides you with toolings so that you can really treat it as code. So it gives you a test framework and a ripple and all that kind of stuff.

Wes Reisz: I think I said loops when I was talking about Lua, but I meant if blocks and things like that. There's some constraints when you have ifs. Are there any limitations, I guess, is really where I was going with Rego? How do you maintain performance, I guess, when you're in the request path with some of these decisioning things that you have to deal with?

Tim Hinrichs: One thought here, at a high level, was that the way that I sort of think about this on a line of expressiveness. So at one end of the spectrum you'd have RBAC, role-based access control and attribute-based access control. Maybe ACLs, maybe IM. At the far other end of the spectrum, you have programming language. Fully Turning Complete, do whatever you want, call it DNRA, open sockets, sitting infinite loops, that's fine. Rego is in between those two. It is more expressive than all of those RBACs and ACLs and IM. In some part because it does support looping and iteration natively, but it is less expressive than the programming languages and the turn complete programming languages. So it sits in between and that was kind of the goal. That's also why it can be applied to so many of these different use cases.

Tim Hinrichs: Like Kubernetes requires you to dig down through deeply nested structures, but also to do N-level deep looping as well. So to be able to just express the policies that people seem to need, you do need that kind of thing that's broader and richer than the standard access control languages. What I didn't answer yet though, was how does that help with performance. And Torin's already articulated that there are a number of different performance optimizations that we do. Some of which are possible only because it is less than turn complete. If you've got a programming language, what kind of optimizations can you do? You're certainly limited there. And by dropping down that expressiveness, that enables a whole new class of performance optimizations typically. So one of those is rule-based indexing. So it'll go ahead and statically analyze the rules that you write and build a data structure that minimizes how many of those if then statements get evaluated, on any particular requests that comes into OPA. So we see that being used very heavily for the microservice case.

How does OPA integrate with Kubernetes? [21:03]

Wes Reisz: Makes sense. Let's talk specifically about Kubernetes. How does OPA integrate with Kubernetes?

Torin Sandall: Yeah. So OPA is most often used as an admission controller in Kubernetes. So admission controllers are this framework within Kubernetes. It's the last place where policies get enforced, essentially, in the cluster. So that's where all the semantic validation of Kubernetes resources occurs before the resource is persisted to etcd before all the controllers go off and start doing stuff with it. So it's sort of like the last chance that you have to prevent really bad things from happening inside of your cluster. So it's really important to have a good handle on that. And so OPA plugs in as a web hook admission controller, we can run it on your cluster basically. And so effectively every time a resource is created, updated or deleted in the cluster, the API server will make a call out to OPA and ask, should this be allowed or not?

Torin Sandall: And when that happens, OPA will look at the resource, the pod, the service, the ingress, whatever, that's being mutated and it'll decide whether or not to allow that and possibly mutate it. And then it'll return that decision to the API server. So it's a really important piece of functionality because, when you're looking at Kubernetes and you're running Kubernetes in a large organization, you're typically handing it off to a bunch of teams, app teams. To people that are developing and deploying applications on top. And while that's great because it accelerates their development and their deployment, because it gives them the ability to say, run these workloads with this much CPU, and this much memory, and expose them on these ports, and give them this storage and all that kind of stuff.

Torin Sandall: That's a tremendous amount of control and responsibility that's being delegated to development teams. And so it's vital that platform engineers have the ability to put safeguards in place that prevent applications from impacting each other, from doing things they shouldn't on the cluster, from opening up load balancers on the public internet they're going to get access to. Or opening up egress traffic to the public internet. There's just this long, long list of potential risks there that admission control and OPA help mitigate.

Wes Reisz: Talk about that a little bit more. You mentioned a few, but what are some of the common use cases that people use OPA for, with Kubernetes to be able to solve?

Torin Sandall: We just did an article recently about the top five and I don't remember them off the top of my head. I guess when it comes to pods and compute resources, people are often looking at ways of ensuring availability and just uptime. So they're making sure that developers are setting liveness probes and readiness probes and setting CPU and memory requirements properly on their compute resources. When it comes to networking resources, that starts to get into a little bit more of the security concern and domain, but it's still availability. So one of the really popular first use cases for OPA, around admission control in Kubernetes, was to prevent conflicting ingress resources from being instantiated in the cluster. And it just so happened that OPA was a really good fit for that problem and people picked it up and started to use it.

Torin Sandall: But that's more of an availability, uptime concern. There's also security. If you allow your developers to instantiate network policies, they can open up egress traffic anywhere they want. And so you not want an app running on a cluster to be able to talk to any IP address in the world. So you can write Rego policies that look... It's kind of cool. It's like a meta policy. You're writing Rego and it's looking at these network policy configurations and deciding whether or not those firewall rules, effectively, are valid.

What is the latency cost of adding OPA to the stack?[23:59]

Wes Reisz: Some people are going to think: we're adding one more thing into the request, the request path of actually getting things done. Everything comes with a cost. What's the latency cost? I know we talked about performance if things are built for, optimized for performance, but is there a cost to adding OPA into the stack like this?

Tim Hinrichs: As you say, there's a cost for everything. So the thing we often will talk to users about is cost benefit. Yes, there's going to be a cost, but if you're preventing a bunch of problems that you know are going to bring you down or open you up to security or compliance problems, then it's probably worth the cost. In terms of numbers, we usually think about the different use cases for OPA as having different requirements around performance. So for Kubernetes, 10 milliseconds or even a hundred milliseconds is not usually a problem. If I thought about microservice authorization and somebody was going to say, "Well, I'm going to put a policy in place that takes a hundred milliseconds to evaluate." I would say, "No, no, don't do that." Netflix always gave us a millisecond. If there's a millisecond of overhead extra, fine, no worries. If you think about the CI/CD pipeline, it's probably even longer. Is anybody going to care if it takes 30 seconds to evaluate a bunch of policies over a bunch of data files? Probably not.

Tim Hinrichs: That's typically how we think about it. And then the other side of that is that some of these different optimizations that we put in place take advantage of different fragments of the language. We talked about how this language sits between the RBACs and ABACs and programming languages. Well, even Rego itself, the language was designed as an onion. So there's a core part of the OPA language that will run very, very, very quickly. If you write policies that go beyond that first level, then they'll still go quick and you'll get a little bit more expressiveness, but they won't be as fast as that core. And so then what we do is we work with people. We'll say, "Yeah, what are your performance demands first? And then we'll help you pick the right fragment of the language to use."

How are OPA policies loaded into kubernetes? [25:42]

Wes Reisz: How do I get OPA policies into it? How do I get Rego into here to actually protect things? So how do I introduce policies into my Kubernetes environment?

Torin Sandall: That's a great question. So for Kubernetes admission control, we have a sub project called Gatekeeper and that's sort of our Kubernetes native integration for OPA. And so that allows you to load policies, essentially, as CRDs. So, you can create some toolings that takes your Rego files and then stuffs them into CRDs. And then those get populated on the cluster and Gatekeeper picks them up and starts enforcing them. If you're just running plain OPA, then you can just volume out then in, or you can use OPAs management API. OPA has these APIs that allow you to control and observe it. One of those is called the Bundle API, and that's basically a way of distributing policies out to arbitrary OPAs, not just Gatekeeper or Kubernetes specific ones, but arbitrary OPAs. You can use the Bundle API for that.

What does it mean for OPA to be a graduated project from the CNCF? [26:28]

Wes Reisz: Recently there was news that OPA graduated from the CNCF. What does it mean for OPA to be a graduated project from the CNCF?

Torin Sandall: The CNCF, the cloud native computing foundation that hosts OPA, they have a laddered model of projects. They come in at sandbox, they go to incubating, and then they graduate eventually. And that process is just about demonstrating that you've satisfied a bunch of criteria. And so, the criteria is largely around adoption. So basically end user companies are running the software and then as well as contribution back into the project, ensuring that there's a vibrant community of folks that are actively participating in the project. And then there's a bunch of other technical best practices that projects need to adhere to, especially around security and release process and things like that. And so we went through the journey. We joined the CNCF in 2018 and then in 2019 it was promoted to incubation. And then just recently it was graduated because we'd demonstrated the adoption and contribution and whatnot.

What are some of the new features and planned ones on the roadmap? [27:26]

Wes Reisz: That's awesome. Congrats. One of the things I think I saw at the end of a talk or read a blog post, that there's a WASM story around OPA. I wasn't expecting that. Tell me about that. Why WASM? Is it just for performance reasons, but why WASM? Where'd this come in?

Torin Sandall: There's a bunch of reasons why we're excited about Web Assembly. So if people aren't familiar with it, Web Assembly, or WASM, is this instruction format for virtual machines. So it's similar to the JVM, but it's delivering on this goal of write once, run anywhere. So it's been adopted by all the browsers as well as for these non-web embeddings. So people are putting it into service proxies, they're putting it into databases, they're putting it into CDNs, they're putting it into blockchain even, they use it in IOT. And so Web Assembly is really attractive from a policy point of view because it's a new enforcement point that otherwise wouldn't exist. So it allows us to basically enforce policies in places that otherwise we couldn't reach. And so what we're doing with OPA is we're basically taking Rego policies and we are compiling them from scratch into Web Assembly.

Torin Sandall: And so what that means is that you can basically take an OPA policy and run it inside of any Web Assembly compatible run time. So this is great for a bunch of reasons. It allows us, for example, to just have library embeddings for arbitrary languages. OPA itself is written in GO. So if you don't want to integrate with it over HTTP and you want to embed it as a library, you're limited. You can embed it as a library in GO, but you probably don't want to be embedding a GO program inside of C plus plus program or a Java program. Web Assembly gets us around that because we can just compile up the Web Assembly for the Rego and then run that Web Assembly inside of a WASM library for Java or .Net. I just saw this morning that somebody had created a Java SDK for OPA using Web Assembly. So now we've got Node, GO, Java and .Net. So it's really cool to see that.

Torin Sandall: It's enabling all these library embeddings, it's getting us into places that otherwise Rego couldn't run in like CDNs and so on. And then obviously it's great from a performance perspective because it gets rid of the interpretive overhead that the standard OPA evaluator has. And so we've actually gone and integrated a Web Assembly runtime back into OPA itself. So we can now use Web Assembly as an optimization path within the normal evaluator, the normal interpreter.

Wes Reisz: What's the future hold? What's the roadmap? Look, now that you've got a graduated project, you've integrated or you're integrating in WASM into it. What's next?

Tim Hinrichs: I'll give you some high level thoughts. On of the things that's been great is the community. Working with all these end users, working with contributors, working with people who are doing integrations with OPA, it gives us very clear insight into what's needed and what to do next. And so, certainly at a high level, it's been fun doing that. And we're going to continue just following the community and doing some of the things they need. But I think Torin, you've probably got some other more specific things.

Torin Sandall: On the technical side, there's a bunch of interesting work happening right now around adding support for metadata and schema support. And that's being worked on by some folks at IBM research. So I'm really excited about that. I'm looking forward to all the tooling that that's going to enable around OPA. Feature-wise, we're looking at some syntactic improvements in Rego, adding some sugar to the language in the near future. That's exciting. The project's been around for a little while now. So we have a pretty mature user base. And so we're just continuing to focus on performance and reliability. And so there's a long list of optimizations that we're reaching to implement in the core.

Tim Hinrichs: I'll add that that metadata feature, the type checking as well that the IBM folks are doing, those are both pretty exciting because that metadata allows us to put documentation into the policies themselves. And so we're looking forward to a world where, if somebody wants to know why a decision was made the way it was, then you could imagine getting a text string back that says, well, these five rules are violated and it's not just the internal Rego rules, but rather their English strings, like a business level user can make sense of.

Where can people find out more on OPA or possible get involved with the project? [30:59]

Wes Reisz: When we were talking before you mentioned that there's some academy or something out there that people can get access to. Can you talk a little bit about if people are interested and want to get a little bit more, where should they go to learn a little more about OPA?

Tim Hinrichs: There are a couple of good resources, some of which are easier to find than others. So I'll call a couple of them out. You'll find this one from the OPA website, which is the OPA Playground. So that's just linked right there. You should be able to find it. That's a great way of trialing out Rego, editing, some simple policies, getting that workflow in your head about how do I write a Rego policies and how do I debug them? And then complementary to that is the Academy. So we put an academy together, a course, an online course that walks through an introduction to Rego and it's got lectures. I've done the lectures. So you hear from me more and then there are quizzes as well. I know I need these. Anytime I listen to something I'm like, great. Yeah, I think I got it. And then you have to actually answer some questions. It makes you check, do I actually understand what's going on?

Tim Hinrichs: And then we just started adding our first hands-on lab to that as well. So that's really great because then it's not just some multiple choice questions, but we ask you to go out and write Rego and actually run OPA and feel what that's like. So anyway, we're excited by that. If you want that, that's not linked from the OPA website. So you've got to go to academy.styra.com.

Wes Reisz: Torin, tell me a bit about the community and if people want to get involved, what are some good places that you'd love to see them reach out?

Torin Sandall: Like Tim said, we love the community and it's been great working with them this whole time. And one of the best ways to get started is, if you have an idea for an integration or something like that, where it doesn't exist today, there's plenty of places where OPA could be plugged in. And so if you have some system and some use case, feel free to get in touch. You can file a Github issue. You can join our Slack as well. So if you go to slack.openpolicyagent.org, it's like 4,000 people on there. Everybody's super friendly. You can talk to folks and you can talk about whatever you want to build or contribute and we'll be in touch. And so, yeah. So Github issues is a great place to go. And Slack is another awesome resource. And then in terms of contributions, we're always looking for new integrations and new places where it can be plugged in. So I highly recommend that.

Wes Reisz: Torin and Tim, thank you for joining us on the InfoQ podcast.

Torin Sandall: Very cool.

Tim Hinrichs: Thank you, Wes. This was a lot of fun. We should do it again.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.