InfoQ Homepage Presentations Application-Layer Encryption Basics for Developers

Application-Layer Encryption Basics for Developers

View Presentation

Speed:

Download

38:53

Summary

Isaac Potoczny-Jones covers the basics of encryption, what are application-layer and infrastructure-layer encryption, when to use asymmetric and symmetric keys, and how to do key management.

Bio

Isaac Potoczny-Jones is the founder and CEO of Tozny, LLC, a privacy and security company specializing in easy to use cryptographic toolkits for developers. Isaac’s work in cybersecurity spans open source, the public sector, and commercial companies. Isaac is an active open source developer in the areas of cryptography and programming languages.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Potoczny-Jones: I'm Isaac Potoczny-Jones. We'll explore how to choose between application layer and infrastructure layer encryption, and what we mean by those two things. When to choose asymmetric and symmetric encryption, at least at a very high level. What to do with your keys, which is always challenging. I'll go through a very simple code example, talk through what this leaves out. Then we'll do a quick teaser about quantum computing and how that's going to impact cryptography going down the line. Of course, the caveat is, you're going to learn enough to be dangerous in this conversation, but hopefully enough to get you excited. The PSA, of course, is don't roll your own crypto, at least not for production, at least not until you're ready.

Infrastructure Layer and Application Layer Encryption

We'll start out by talking about infrastructure layer and application layer encryption, and what the difference between them is. I like to start with an analogy. When we think about the safety of driving and driving in cars, we address both the road and the car. Think of this as the road is the infrastructure and the car is the application. When we secure the road, we think of things like stoplights, speed limits, gentle curves, lines on the road, no passing zones, and things like that. When we think about encryption or security on the infrastructure, we think about things like HTTPS and TLS, VPNs and IPsec, service mesh encryption, full disk encryption, and database encryption. These are the kinds of infrastructure layer encryption. Frankly, that's what most people are familiar with, and what most of the time we do when we're encrypting and securing systems using cryptography.

When we secure cars, we have seatbelts, crumple zones, airbags, horns, and hopefully better driving. When we secure the application, we often don't think about cryptography as one of the first things we go to. Of course, we need to protect against malware, buffer overflows, and side channel attacks, and things like that. Try to get people to be better programmers, so they don't make programming mistakes that lead to bugs and security bugs. Cryptography is not really one of the tools in the toolbox. One of my arguments here is that it should be, that we should be doing more application layer encryption for security. I gave a talk about this at a previous QCon, which you can look up or you can look up an article I wrote on it for QCon.

I would argue that use application layer encryption when your security should travel with the data. You may be working across multiple infrastructures, and for instance, HTTPS only covers a small part of the data flow inside your infrastructure, if you need an extra layer of protection, because the data is sensitive, or it may go outside of a specific infrastructure. Most importantly, if you need to enforce access control with encryption. For example, if you think of something like end-to-end encryption in a chat app, for instance, the access control is the sender and receiver, are really the only people who can access that data. That's not enforced just with a bit on a server saying who's allowed to do what, it's enforced through control of cryptographic key material. It's very clear how to use that in chat. It's actually a generalizable capability that you can use across lots of different types of use cases. Like in that use case, application layer encryption improves privacy. In some cases, it improves privacy substantially. It's actually significantly harder for developers than just implementing something like HTTPS. Next steps, you may look us up, easy to use end-to-end encryption is what we do. I also have a blog series on encryption for developers. If you can get to those links, and check it out, that might help you out a little bit.

Symmetric and Asymmetric Encryption

Let's talk a little bit about the difference between symmetric and asymmetric encryption. This is a 101 level here, but bear with me, we'll walk through some examples and at least we'll level set on when to use these different types of things. First, I'm going to introduce some basic terminology. Of course, encrypt is when we take a key and we take some plain text, and we put them together in a way to make it secret, and that outputs the ciphertext. Encryption manages secrecy. Decrypt is just the reverse of that, you use a key to undo the encryption and you make something readable. You input ciphertext and output plain text. Signing is the use of a key to prove integrity. You input something and you output a signature for that thing. Verification is that check to see whether the signature matches. You input a thing and its signature, and you output true, only if that signature matches. Signing and verifying enforce integrity in a sense. You can't necessarily stop a bad guy from changing something using encryption. What you can do is detect that they've changed something. I'll use sign and verify generally. I'm not going to go into details about hashing and tags, Diffie-Hellman key exchange, and things like that. This is the basic terminology we'll be working with.

Symmetric "Bulk" Encryption: AES GCM

Let's talk about symmetric or bulk encryption. The example I like to use here is AES GCM. It's a very good cipher. It's a very good mode. It does a lot of things that we're usually looking for. There are other good ones out there, but we'll just use this one for now. Symmetric is called symmetric because it uses the same key for encryption, decryption, signing, and verification. That's very helpful. It's very fast, that's why it's used for bulk encryption. It's very efficient, that's why it's used for bulk encryption. It's really good encryption. The challenge here is that you need a way to share the key. Let's say we have a patient with plain text, healthcare data, on one side. They encrypt something with AES, and they send it to their doctor on the other side, and now they decrypt it over here with an AES key. Then they have the plain text health data here. In between the bad guy only sees encrypted health data, so they can't see anything else. The problem being, how do we make sure that this key gets from the patient to the doctor, or vice versa? This ends up being important, and really relatively tricky to do, because it has to be the same key.

Asymmetric Encryption (PKI): ECC

Let's talk about asymmetric encryption, or PKI, which stands for public key infrastructure. One of the algorithms used here is ECC, or elliptic curve cryptography. This is really a class of cryptography with a number of different algorithms in it. It's called asymmetric because there are different keys, the public part and the private part. This is used for key agreement, typically. We have for instance, in a AES key, that's the symmetric key. We want to get that key, so that both parties have the same key, but no one in the middle can actually see that key or any of the sensitive data. One party can have a private key, one party can have the public key, and through key agreement, we can decide on what this AES key is. We use ECC to exchange the AES key. Then we use AES to bulk encrypt the health data, and to decrypt it on the other side. In this instance, we can now figure it out. The bad guy is allowed to see the public key, they're allowed to see all the encrypted stuff, but they can't see the private key or the whole thing is broken. In summary, asymmetric encryption is used for key exchange, and symmetric encryption is used for bulk encryption.

What do you do with your keys? This ends up being relatively tricky. How do you generate them? How do you store them? How do you maintain them? We have symmetric keys, again, you encrypt or you exchange them using your asymmetric keys. Then for the public part of your asymmetric key, this is actually fairly tricky. Think about the way PKI works. We can give those public keys away freely, but people have to know that it comes from you. That's where integrity is very important. We don't need to keep those public keys secret, but we need to trust who they're coming from. That turns out to be fairly tricky, and so we have these things called certificate authorities, we generate certificate signing requests. Generally speaking, what we're trying to do is prove that a trusted party holds the private key.

The asymmetric private keys, again, are tricky. You can store them in a file, or you could store them encrypted with a password. Now in turn, that password usually generates a symmetric key. You could put them in a hardware security device like a rack-mounted HSM, a PIV card, which is a smart card used for cryptography, a YubiKey style or custom purpose hardware. You can put them into your operating system keychain, and that may be protected with a screen lock or a biometric, like on iOS and Android. Or you can use a secrets manager in some infrastructure like 1Password, Vault, CredStash, and things like that. In summary, every different type of infrastructure has a different way of managing secrets and a different way of managing your private keys. There's no one size fits all answer here, but it's useful to know for a variety of different types of key management solutions.

Hardware Key Management

Let's talk a little bit more specifically about how some of these things work. I'll just give an example of a hardware key management solution. There's really two approaches. You can encrypt a wrapped secret, so that is performing the encryption in software, but use the hardware for that type of encryption. You start out with the software. Let's say you need to decrypt something, so you have the encrypted object and you have the encrypted key, you hand the encrypted key to the hardware device that can decrypt that key and send it back. Then you do the decryption over here on the software. The downside there as I've circled here in red, is that that decrypted key is in some level available and could be subject to attack. This is the downside of using this approach. The upside being it's really very scalable to do it this way.

You can also perform encryption operations in the hardware itself. In this instance, you have the whole encrypted object that you're sending to the hardware device. It's actually using its keys over here to decrypt the object itself and sending the decrypted object back to the system. This is sometimes more expensive, depending on what these devices are. They may charge per key or whatever. The upside here is that the keys are usually generated inside this device and never leave this device. What that really gives you is confidence that nobody's going to be able to attack and capture the keys. That ends up being very important, but it's a little harder to manage. It's a tradeoff here. This is again, just one example of how you may manage keys. I think it's instructive.

Example

Let's start off by showing a little bit of an example here, this utility that we wrote. I'll run init. This does and it doesn't output anything, all this does is generate an AES key, and it stores it in the Apple Keychain if it's not already there. That key will be used for encryption and decryption. Then if we say encrypt, and we can give it a message. Then Apple operating system will say, do you want to unlock the keychain to allow access to this key that was stored in the previous step? We'll go ahead and allow it, and then we'll get output of the Ciphertext and the Nonce. If I run it again, it asks again. You'll notice that it actually gives a different output. These outputs for the Ciphertext and Nonce are quite different. That's actually a security feature of this way of AES working, where basically, you can't learn too much from the ciphertext itself about what the message might be. Then it's easy enough to decrypt it. We just say decrypt. We give it the ciphertext, we give it the Nonce, and run it. Then of course, it asks again. Then we get our message back out. If we run that, and we change anything about the message, like the Nonce, we still get that prompt, and we get this error message that authentication failed. That's really a useful error actually. What it tells us is that maybe the key was right, but in the end, it refuses to do decryption because it really can't demonstrate that that message hasn't been tampered with, because indeed, of course, we did tamper with it.

Code Example: Init

Now that we've seen the application in action, let's take a look at a little bit of a code example for symmetric encryption using AES, that's bulk encryption, and a little bit of key management using the Apple Keychain. In a sense, this is a complete solution, because it does both encryption and key management. It's for a very limited type of use case. It could really be only used for one user on a single device, not sharing with another party and not uploading data to a server. Again, in itself, it is a complete solution. The init function, basically all it does is it generates a random key. In particular, this random number generator here is a cryptographically strong random number generator. You always want to make sure that's what you're doing. Then it stores it inside the Apple Keychain. I just found some utility that uses the Apple Keychain for storing simple secrets. We generate a random AES key. We take that plain text key and we stick it inside the OS keychain, and now it's locked up inside that keychain access control by the operating system. Not a huge amount of code, and it's doing some just generation and storage of the key.

Code Example: Encrypt/Decrypt

Let's talk about encrypt and decrypt. In the code base here, I've just put the encrypt and fetch key functions. Basically what we're going to do is input from the command line what we want to encrypt, fetch the plain text key from the keychain, and then encrypt or decrypt as it were. Again, we're just showing encrypt over here and fetch key. Over here it's just, give me back that key I gave you previously. It might pop up that message to the operating system saying, do you really want to give this user back that key? Then we'll run the encrypt function here. This is not a long function, a lot of it's really just error handling. Again, we'd have to do some cryptographically secure random number generation. Pick the correct cipher. Generate the Nonce correctly. Use GCM as the type of cipher we're doing. Then basically seal it up and sign it, and then we'll output it. That's really all there is for encrypt. Decrypt is relatively similar. Here's basically an end-to-end example of doing encryption and decryption and key management in a really small and simple environment.

Summary: What It Leaves Out

In summary, what this leaves out. You officially know enough to be dangerous. We've not talked a lot about keys, not in detail. What do you do with your public key? How do you manage a certificate authority, whether it's run inside by your business, or whether it's signed by something like Let's Encrypt? We haven't talked a lot about secure key generation and how random numbers work. We haven't talked about choosing a key length, generating keys from passwords. The whole field of encryption for privacy is something we haven't talked about. FIPS compliance is really very important in some instances. Libsodium is a great library that is not FIPS compliant, but it's really effective and it's really nice to use for developers. Definitely look that up if you don't have a FIPS compliance requirement. We haven't talked about hashing or password storage, when to sign plain text versus ciphertext. All the different versions of AES, Nonce, initialization vectors, and how crypto systems actually get attacked. I'm just pointing this out, as a short talk that goes over some of the basics and hopefully gets you interested and excited in learning how to put crypto into your system, and learning more about all these other areas so that you can really cover all of your bases.

A Teaser about Quantum Computing

For a quick teaser about quantum cryptography. Quantum computers eventually might be able to break asymmetric encryption, and it might get there soon. We don't really know how soon that will be, but this will make key exchange very hard. Again, it doesn't break symmetric encryption, but we do believe it's going to break asymmetric encryption. What does soon mean? That really depends on how long you want your data to be secret. Does it need to be secret for decades? Does it need to be secret only for the 10 minutes it takes to run your credit card? It really depends. Then also depends on, how long will it take to standardize and adopt a solution. We learned a lot about the rollout of crypto when we moved from an old asymmetric system called RSA to a near asymmetric system called ECC. That rollout has taken a very long time. What we can assume now, especially since there's more crypto now than ever, is that it's going to take a long time to basically develop quantum strong algorithms and roll out those solutions.

What are we going to do about this? NIST is running a competition to find new algorithms. In the meantime, using AES 256 and pre-shared keys is a way around this, meaning that instead of using asymmetric crypto to share AES keys, you just make sure that both parties have a set of AES keys that they can use for bulk encryption, and they just have a way of agreeing on which key they're using for what message. It's not crazy at all, but it's definitely nowhere near as flexible as a public key infrastructure. Or, of course, if you're not part of this attack model, you don't have to worry about it. It's just good to know coming up, that things will be changing a little bit in the way cryptography works, especially with asymmetric key exchange.

Questions and Answers

Schuster: Any opinions on HashiCorp key Vault? Do you have any opinions maybe on similar types of storage systems? Are there ways to use them correctly, or how do people use them incorrectly? What can I do to shoot myself in the foot by using one of those tools?

Potoczny-Jones: I think these tools are extremely important. Tools like this, especially in an environment where you're very dynamic, turning on and turning off virtual machines for instance, so you don't have a lot of state to work with. You can't just store your keys in a file on a file system anymore. You need a place to go and get them. This could be cryptographic key material. It could be API keys. It could be database passwords and things like that.

Anything that's important to your infrastructure, or operating securely. I think everybody probably should be looking into these types of tools, no matter what's your infrastructure. Because this style of storing keys or passwords or things like that and configuration files on a local file system, local to your server, besides not really being compatible with modern pipelines, is also definitely subject to attack in a different way. If somebody breaks your application, they get your keys, because they're on a file system.

I don't speak to any particular product. HashiCorp is certainly one of the market leaders here. There's a number of other ones, there's stuff built into Kubernetes, there's stuff built into virtually every cloud service provider. Google has their own thing. Amazon has their own thing. Microsoft has their own thing. Start with what your environment is, look and understand your needs, as far as how secrets get managed, injected, and access controlled, and pick the solution that works best for you. Definitely take a step back and say, how am I doing secrets management? How is that part of my overall security posture?

Schuster: Can you go into more detail over your suggestion of pre-shared AES keys? Do I have to clear like one time parts to my partners or are there other ways?

Potoczny-Jones: Pre-shared AES keys is something I mentioned in terms of mitigating the risk of quantum computers being able to attack asymmetric crypto. The idea here is that, because we're mainly using asymmetric crypto, public, private keys, or exchanging AES keys or symmetric keys, we could skip that step, in a sense. We can pre-share these keys. What this assumes, is that you have a secure channel of communication to start out with. That's why we need public key infrastructure. A secure channel of communication might be that, the two phones that want to talk to each other are right next to each other, and you can plug them into each other or something along those lines. Then they can hand each other a bunch of keys, or you can put something on a YubiKey or on a secure thumb drive or what have you, and to put a set of keys on each party. Then the two parties go far apart from each other, and then they can still communicate because they have access to that key material. It's very difficult to use in that sense. If you're trying to talk to your mom who is in Ohio, and you don't want to read an AES key off to her over the phone, it's not really a practical solution.

If you're in an environment where the IT staff can hand a bunch of people a bunch of keys, and they can go off and do their work, then it may be more tractable. It's definitely not something I recommend even trying or starting with in an environment that's not really high security, or where you have access to a CA or you have access to PKI, asymmetric key exchange is just the best way to go when you're not trying to mitigate quantum computers. Nobody is attacking those kinds of conversations with quantum computers. I wouldn't worry about it unless you understand that attack model. I mainly bring it up to point out there's going to be changes in this area. Those algorithms that we know very well, ECC and RSA and stuff, they're going to be, probably over a long period of time replaced with new algorithms with different properties and different key sizes.

Schuster: Can you compare the Bouncy Castle library with Libsodium, or maybe others?

Potoczny-Jones: I'm quite experienced with both. I love both of them for their own purposes. I have a lot of experience in Java. I think my first QCon talk was probably about Java and the way the cryptographic primitives are set up, and set up for failure in a way. Maybe they've fixed this. At least a few years ago, when I looked at it, Java had an insecure mode for AES, as the default, and just leaving it at that. If you just said, I want to use AES so give me AES, it would hand you the ECB version, which is an insecure version of AES. That cryptographic library has been around for ages, there's reasons for that. It really was easy to shoot yourself in the foot. I actually discovered this because I sat down with an engineer and said, "Let's review your crypto code, I see some problems. I can't remember exactly how to do this, let's try and find a solution." I looked on Stack Overflow, and everyone had it wrong. I was just amazed. We ended up releasing a library for Java that uses either Bouncy Castle or whatever the crypto provider is on that platform to try to make it really simple, and that was the open source library.

When I compare this with something like Libsodium, I love Libsodium. What it really does is, I think it creates a much more usable programming interface for packaging up cryptographic primitives, so that you don't really have to understand the low level primitives. It just is a lot harder to shoot yourself in the foot, in my opinion. What's happened with Libsodium is they have a very core C library, and people have either implemented it in all these different programming languages, or they've wrapped it in portable languages, just like foreign function interfaces. Thus, you have a pretty commonly used library that's compatible across a lot of different programming languages and platforms. Because believe it or not, if you're in Java Native, and you're trying to use Bouncy Castle, and then you want to send something to Python and use something on a server, or on OpenSSL and encrypt and decrypt, it might be pretty hard to line those things up. Libsodium really helps with that.

It's an upside in another sense, but one downside for Libsodium, it doesn't use the NIST algorithms. It has its own algorithm suite. It doesn't use AES, it has its own algorithm suite. It's not FIPS certified or anything like that. If those are important to you because of compliance with government regulations or something, then you probably want to think about which one you're using in that case. Bouncy Castle has a NIST compliance version, for instance, but Libsodium does not.

Schuster: Any preference about ECC versus PGP?

Potoczny-Jones: Elliptic Curve Cryptography is a type of asymmetric cryptography. PGP is a tool, or GPG is a compatible tool for basically encrypting stuff at the application layer. It does a lot of stuff for you. PGP uses ECC, or PGP uses RSA. I wouldn't necessarily compare and contrast them in that sense to say use one or the other. PGP is a great general purpose tool for people or systems to manage a network of trust for shared keys, and to exchange the public keys and do verifiable encryption and signature verification. I definitely recommend getting to know PGP and GPG, as great tools in your toolkit, for doing different types of crypto. Basically, it's like a command line utility, you can think of it. Libsodium is like a library, but PGP probably has also library bindings in a lot of languages too. You could use them for different things.

Schuster: Make sure to use authenticated additional data versus normal symmetric encryption, should the first one be preferred over the latter in terms of security?

Potoczny-Jones: When you use symmetric encryption, a tool like AES, it doesn't out of the box necessarily mean that you can't tamper with the message. That's a misunderstanding a lot of people have. They say, I encrypted it. It's secret. Nobody can see it. Then over here I'm decrypting it, and I'm checking it. Without further help, you can't necessarily demonstrate that the message hasn't been tampered with, just because it decrypts doesn't mean it hasn't been tampered with. GCM is a mode of using AES that also enforces that tamper resistance. Definitely, nowadays, I would be looking for a mode that does tamper resistance if you're doing symmetric encryption. The other way to do it is with hashing and signatures. If you have access to something like GCM, I will use that.

AES CBC for instance, which is The Cipher Block Chaining mode of AES, can be tampered with, whereas AES GCM cannot be tampered with. If you do tamper with it, then it won't decrypt. That's I think the example we saw in my code example. For preference, I would definitely say you want a mode that cannot be tampered with. You almost always need that. Confidentiality isn't usually everything. If you're doing an API call, for instance, you encrypt something and send it, then you run some function based on the message. If the message has been tampered with, and you authenticated the user and then you run functions, then you're doing something outside of the security control of the system.

Schuster: Any idea of how to get rid of passwords? I think this is in relation to HSM, or key stores? If someone gets access to that one password they have the keys to the castle, basically.

Potoczny-Jones: I'm a big fan of hardware security. There's nice, relatively new protocols for authentication. I think the thing about passwords is that they're just so darn useful. They're very well understood. I say this as somebody basically because I spent a few years of my career trying to push a product that tried to get rid of passwords. They're very well understood. They're very natural for a lot of people. They cover grounds around, you forgot your password, how you reset it, and things like that. It depends on your use case for why you're trying to get rid of it. If it's about, in a corporate environment, having multiple people having access to something, you can make a long key and split it, and have parts that are held by both people. I'm a big fan of hardware key storage on your phone, hardware key storage on your computer, or YubiKeys, and things like that, where you can either store cryptographic keys themselves, or you can store passwords. You can put something directly in the YubiKey, and have that be the baseline for decrypting a vault of some kind?

Schuster: Using an HSM for all the crypto work might become expensive, so the idea is you only have special keys in the HSM. How do you avoid someone in a company, for instance, fiddling with those keys that have basically more access?

Potoczny-Jones: It sounds like you're trying to find a middle ground between you're generating and using the keys inside the HSM versus maybe just using it for wrapping keys, and then having them potentially be available outside to privileged users, or something along those lines?

Schuster: It seems like.

Potoczny-Jones: There are tools out there around this field called privilege access management. My favorite answer is basically multi-party control, which is that you have a layer in your platform, where access is the one critical thing you just can't put in the secrets manager, or you can't turn into an API call, or what have you. Have those things where you need multiple parties, and they can be multiple parties on the same team at the same level. Or you could have a user and their manager, so you have to have some level of approval before accessing that thing. You need a platform to set this up. Hopefully one should take care of secrets management for you. It can potentially take care of things like logging in remotely via single sign-on, or SSH, and things like that, so again you have multi-party control, and you have some layer of approval process for accessing something that sensitive.

For developers, what do you recommend most when encrypting data, files, databases, data, and other offline content? Examples of things developers usually do.

I like pointing out again, the difference between application layer and infrastructure layer crypto. If you have the ability to go into your infrastructure and tell your database, please encrypt all the data, and I'm going to go manage the key over here in your key management solution. I recommend doing that. I think these are relatively straightforward flies you can turn on. They only do so much. If somebody breaks in a certain part of the infrastructure, and that infrastructure needs access to that database keys, they're still going to be able to get to it. I definitely think that all that's really good. I think there's a push in areas like Kubernetes to use mutual encryption at that layer using sidecars running alongside your application, which I think is a pretty darn good approach. A lot of times those have to be signed, and tweaked, and configured in their own way. There's a tool, Istio, I think, that's pretty good with that.

I definitely would recommend starting to move into that area. Get your basics down first, though. HTTPS, put that everywhere, wherever you can. Your edge devices, if you have users, they should have encrypted hard drives and get passwords on their computers, and get your databases encrypted, get your files on S3, or wherever encrypted, and then start to move to the application layer. Even if you do stuff at the application layer, you're not getting some of the guarantees that you need from the infrastructure layer too.

Schuster: Any resources on building crypto that balances theory and practice? These are resources that don't fall into that Stack Overflow type of giving you bad examples that are trivial.

Potoczny-Jones: Check out Cryptopals. A lot of online courses now are being offered in the area. There's probably a lot of resources. We have a bunch of material on our website with links out to other learning material.

Schuster: If everything is encrypted and verified, will there be no more need for antivirus products?

Potoczny-Jones: I think each of these solutions solves a problem in a particular space. I definitely say that in the future, someday when every single transaction, every single piece of code can be verified by a piece of hardware that you can guarantee is in the possession of somebody who is trusted to be acting in that mode, then maybe we can get rid of some of the backstop mitigations, like server scanning and antivirus. I think probably that's a ways out. I do appreciate the sentiment. I think cryptography can help in a lot of areas. I love to see all of us doing more of it at the application layer, because I think as computer programmers, we have the ability to say, let's control this data. Let's improve people's privacy. Let's improve people's daily lives. It's worth protecting our users and protecting their data.

See more presentations with transcripts

Recorded at:

Apr 01, 2022

Isaac Potoczny-Jones

InfoQ Software Architects' Newsletter