BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Application Level Encryption for Software Architects

Application Level Encryption for Software Architects

Key Takeaways

  • Application-level encryption (ALE) means encrypting data within the application, and not depending on the underlying transport and/or at-rest encryption.
  • Encryption is easy, key management is hard—any encryption process requires key management infrastructure and several additional supporting processes, which should be aligned with your systems architecture, FRs, and NFRs.
  • ALE can be implemented in various ways to address different security requirements—from end-to-end encryption and zero trust architectures to partial field-level database encryption.
  • The encryption subsystem works better when integrated with others to form defense-in-depth: with access control, logging, intrusion detection, request authentication, and data leakage prevention.
  • ALE protects from more risks than transport and at-rest encryption, but at the cost of tradeoffs. Some of them (for example, searching encrypted data) have been addressed with understandable tradeoffs, some are unique and need to be considered separately.

Why application-level encryption?

The first step to journey into data security is to become aware that software and architectures are changing, and so are data protection requirements in modern applications. New technologies, architectural patterns, advances in cryptography, and regulatory constraints all create new sets of requirements.

One of these sets of requirements is application-level encryption, which is surfacing more and more lately in finance, healthcare (think about application-level end-to-end encryption for sensitive patient data), and others.

Where does it come from? Changes in the threat landscape, attack techniques, and necessary security posture require us to think harder about preventing data leaks with encryption. Compliance requirements that don’t just mandate certain techniques to be implemented and demonstrated to auditors, but actually impose fines on organizations that leak sensitive data, drive increased requirements to protect the data properly.

Previous generation measures are just the starting point: encrypting the filesystem (data-at-rest encryption) protects against someone stealing disks from your data center and setting up TLS (data-in-motion encryption) prevents wiretapping and simple impersonation, but that’s it.

So, as architects explore options of doing more, they stumble upon the ever-ambiguous application-level encryption, field-level encryption, client-side encryption, end-to-end encryption requirement.

As CTO of data security company Cossack Labs, I deal with these concerns a lot both for our products and while helping customers choose their encryption-strategy wisely. In this article, I will walk you through the basics of different application-level encryption approaches, their pros and cons, typical mistakes, and threat models. Sometimes I will use existing software as an example and sometimes you will have to bear with me as we imagine completely new software designs!

Encryption starts from the design

Unless well-defined, the task for application-level encryption is frequently underestimated, poorly implemented, and results in haphazard architectural compromises when developers find out that integrating a cryptographic library or service is just the tip of the iceberg.

Whoever is formally assigned with the job of implementing encryption-based data protection, faces thousands of pages of documentation on how to implement things better, but very little on how to design things correctly.

Design exercises turn out to be a bumpy ride every time you don’t expect the need for design and have a sequence of ad-hoc decisions because you anticipated getting things done quickly:

  • First, you face key model and cryptosystem choice challenges, which hide under “which library/tool should I use for this?” Hopefully, you chose a tool that fits your use-case security-wise, not the one with the most stars on GitHub. Hopefully, it contains only secure and modern cryptographic decisions. Hopefully, it will be compatible with other team’s choices when the encryption has to span several applications/platforms.
  • Then you face key storage and access challenges: where to store the encryption keys, how to separate them from data, what are integration points where the components and data meet for encryption/decryption, what is the trust/risk level toward these components?
  • After that you face key management challenges—how to roll, rotate, revocate, escrow, which decisions to make, how they will impact various non-functional requirements of your system.
  • Then, already questioning how deep is the rabbit hole and what is an acceptable level of “done,” you get to problems of mixing and matching cryptographic primitives and key management techniques to match your data flow.
  • While doing all of this, you keep on seeing how important NFRs get impacted one after another.  

It was a part of my job for a while to help both greenfield and brownfield development teams navigate through data security minefields and encryption implementation details, and I’d like to take you on a journey—what do you, as an architect, need to look at? How do you ensure that application-level encryption is useful, and affects the system’s design and attributes in an acceptable and predictable way? Which decisions will you need to make and which decisions are better made by security professionals?

What is application-level encryption?

Is it related to end-to-end encryption? Client-side encryption? Field-level encryption?

Each of these terms point to a combination of data flow choices (how the data will move between components, where the encryption will happen, how the data will be used) and security guarantees (what will encryption protect against and under which set of assumptions).

The name implies that application-level encryption is implemented within your application so that sensitive data security in your application doesn’t depend on the security of transport/at rest encryption of underlying layers. ALE can be as many things as you make it:

  • It can happen on clients, making it client-side encryption.
  • It can happen on clients in a way that no secrets or keys are available to servers, thus making it end-to-end encrypted.
  • It can be context-aware and protect certain fields, thus becoming field-level encryption.
  • Its end-to-end encryption can operate under full Zero Trust assumptions, making the application compliant to zero trust architecture principles.

In short, application-level encryption only points to an architectural choice of where encryption happens. But if we look closer, that means many things for your distributed application.

When and why of application-level encryption

Every security requirement should be driven by a risk model and a threat model that justifies the choice of security control, the scope of its application, and details.

Application-level encryption addresses several main goals:

  • Trust your infrastructure less. Application-level encryption provides data protection on all underlying layers, including all layers of storage and sometimes transit. This drastically decreases the number of attack vectors on sensitive data. Outdated TLS settings or expired TLS certificates won’t lead to data leaks when the data is application-level encrypted.
     
  • Higher level of security against insider and advanced adversary risks. In processing financial transactions and storing transaction data, the risks of insiders or privileged adversaries gaining access to the database are more significant. Think malicious DBA, cloud employee, an adversary with elevated privileges including developer/DBA access.
     
  • Defense-in-depth. Add another layer of security if other data-related controls like underlying (disk, transit) encryption or access control fail somewhere.
     
  • Greater agility and more control on performance and capacity impact. You can encrypt only what needs protection when you choose data to encrypt inside business logic.
     
  • Compliance. Although there is very little precision in encryption requirements in different regulations, none of them precisely says “you need to implement encryption at the application level”—using it simplifies compliance and makes implementing regulatory requirements helpful in other practical goals.

The longer sensitive data stays encrypted in its lifecycle, the closer application-level encryption gets to end-to-end encryption and zero trust architecture. The shorter data stays encrypted, the closer it gets to single point-to-point transport encryption or encryption at rest.

Aside from security pragmatism, there are a lot of recommendations and best-practices that suggest application-level encryption as well. For example, while GDPR article 32 “Security of Processing” says that data should be encrypted, it doesn’t indicate cryptographic details:

Yet, as we would see later, it is typically easier to selectively protect personal data with ALE.

Google’s Building Secure and Reliable Systems says right on page 10:

US Department of Defense in “Defense Innovation Board: Ten Commandments of Software” suggests:

Application-level encryption is becoming a good practice for systems with increased security requirements, with a general drift toward perimeter-less and more exposed cloud systems. So, we can expect to see requirements for deeper integration of encryption arrive in general and industry-specific regulation. Lately, I see increased demand in ALE software from fintech companies, neobanks, and bank-as-a-service. Facing a blend of old and new regulations, they use ALE to encrypt transaction data, PII, and data sensitive with payments and accounts context.

Why not just turn on TLS and database encryption?

Using TLS between various components of your infrastructure is a necessary measure. But it only protects you against leakage and tampering of your network traffic between your nodes and adds authentication for node-to-node links, if you set it up correctly.

This graphic shows the difference between data at rest + data in motion encryption vs. application-level encryption. Application-level encryption keeps data encrypted as long as you choose—up to a fully end-to-end encrypted lifecycle, significantly decreasing attack surface on sensitive data, whereas TLS only protects you from eavesdropping between servers

Encrypting data on a filesystem or database level will protect you against data leakage if the disks have been physically stolen from the server, which is an unlikely attack vector (physical access to server provides more efficient attack vectors than running with a bunch of disks under the moonlight). Unfortunately, in many cases with modern databases, database encryption schemes are self-defeating as data and keys end up in the same database (thus, privileged users with access to the database can access encrypted data).

Application-level encryption (as well as data tokenisation/pseudonymisation) prevents most data-related risks in one shot:

  • Physical disk access risks
  • Adversarial system administrator (OS) risks
  • Adversarial DBA/database-level leakage risks
  • Data in transit between application components (depends on the chosen scheme and enforced controls)
  • Leakage through logs, snapshots, and automated backups

Encryption methods vs. threats and risks

Let’s look and compare how well different places of putting encryption controls will protect against different classes of threat events:

 

Encryption controls ➡️


⬇️ Events

Transit (TLS)

Disk/FS

TDE/DB encryption

Application encryption

E2EE/Zero trust

Physical access to servers

-

+

+

+

+

MitM

+

-

-

+

+

Privileged system access

-

-

-

Most

+

Privileged DB access

-

-

-

+

+

Backups, logs, snapshots

-

-

Few

+

+

Additionally, application-level encryption enables integrating and orchestrating sensitive data access event with other security controls:

  • Access control: mapping key access to presented authorization tokens and identifying permissions during encryption/decryption.
  • Audit logging: associating access (encryption/decryption) with user sessions and granularly recording access logs and other data-related security events.

Different encryption approaches address different threat models. TLS protects data in transit, but won’t help against insiders with access to the database. TDE might protect from misconfigured database access and privileged system access on a database server, but not from exposing data in backups. When insiders and APTs are realistic threat vectors, ALE becomes more relevant. But what’s essential—ALE provides many security guarantees in one shot.

Encryption is part of the cryptosystem

Sophisticated encryption schemes are extremely beneficial yet quite fragile. They require getting many things right in a counter-intuitive fashion while imposing hard limitations on your application design.

To understand design goals better, let’s look at what production-ready application-level encryption implementation should include:

  • Sound cryptosystem with sound implementation and a strong key model that matches your security requirements
  • Key storage and management system that supports the key model of your choosing
  • Integration into your application and data model
  • Integration into your other security components—SIEMs, log analyzers, access control

Cryptosystems are typically shipped as a library or a software product, either integrated within a service that is tasked with invoking encryption/decryption or as a separate API service/proxy. To ensure that you’re using sound cryptography, there are a few simple principles:

  • Cryptography should be boring. You aren’t supposed to make a lot of cryptographic decisions (and have a chance to get them wrong). To learn more about boring crypto, read Boring crypto design principles and Bernstein’s original Boring Crypto presentation.
  • Cryptographic choices should be made by cryptographers. Rolling your own cryptosystems, implementations of cryptographic primitives isn’t totally illegal yet but is already a cardinal sin.

Since encrypting/decrypting requires access to cryptographic keys, the choice where to encrypt/decrypt has to be based on security assumptions:

  1. Encryption library inside the application: If an end-user is a single trusted entity within a system (as in end-to-end encrypted, zero-trust systems), keys should be stored near the user, and encryption/decryption should happen in a client application, preferably within the same component. This way you can protect long-lived credentials (cryptographic keys) with passwords, pins, and biometry, which are provided by the user only to decrypt cryptographic keys. Examples of such libraries are libsodium, themis, gocrypto, Google Tink.

For complex data flows, end-to-end encryption is quite hard, as a lot of parties access sensitive data differently, while integrating encryption with other tooling. Obvious choices would be to package encryption into:

  1. API service: adding a component that has access to the keys and can perform encryption, decryption, and other security functions.
  2. Proxy service: adding a proxy between application and datastore, which will detect and encrypt/decrypt the data. It can be a straight reverse proxy, or a DAO-like service, which owns and simplifies access while performing security operations.

In cases 2 and 3, sensitive computations and keys are separate from the application. Segregating them from the main codebase has several benefits—it’s easier to monitor, update, and maintain the encryption subsystem. There is a vast set of choices of tools for application-level encryption—you can use Hashicorp’s Vault in Encryption API mode, Cossack Labs’ Acra in API, and Proxy modes among open-source tools available.

Crypto is easy, key management is hard

Regardless of which encryption scheme you’re using, its strength solely depends on cryptographic keys, secrets, and subtle relationships between them. Some relationships are fairly easy to comprehend, like signature chains of your TLS certificate. Some relationships are much trickier and depend on understanding underlying cryptographic protocols—like schemes used in end-to-end encrypted chat with elevated privacy requirements.

Regardless of a key management scheme, key management processes will inevitably impact your application and solution architecture.

Storing keys

Keys will typically be stored in secure storage, protected from the rest of the system—your native cloud KMS, Hashicorp Vault, or dedicated HSM could do the job.

However, depending on the key management model, your encryption component might need to request hundreds, if not thousands of cryptographic keys a minute. Key storage is just another key-value storage in a certain sense—so scaling it faces typical issues.

Querying keys for every datastore read/write request is a performance penalty that gets harder to bear with the complexity of key management models. In general, it’s a typical blocking problem with a few new limitations.

Simple solutions to scaling key storage challenges typically rely either on:

  • Increasing general throughput by using dedicated hardware.
  • Using a multi-layered key model while encrypting data blobs, encrypting high-level keys with a master secret that is stored in HSM or provided by the user, while keeping these encrypted high-level keys on a speedy storage. Think key wrapping techniques, key-encryption keys.
  • Caching intermediary/derivative keys (encrypted or decrypted). Caching keys too long increases security risk, caching implications differ between security and engineering perspective vastly.

For example, database encryption suite Acra uses a multi-level key scheme with a key hierarchy that can be mapped/stored in different places to be able to balance tradeoffs. Master key can be stored in secure storage like KMS (slow and rare) or HSM or even fed from the keyboard only! Key-encryption keys can be stored in a filesystem, KV store like Redis (fast and often), and cached in memory (faster, but not for long), while data encryption keys are part of data envelopes. Having enough flexibility enables you to choose which penalties you’re willing to pay. Think about hundreds of keys with different TTL and cache policies, stored in different places.

Tying keys with user identity

Based on many requirements, engineers who plan encryption components of their tools, tie encryption keys to users/user IDs/profiles.

To do that well, you might need:

  • Separate user registration and processes of key generation/registration. You should be able to re-trigger them during key rotation.
  • Separate processes of key invalidation and removal.
  • Means to associate the key with the user between the application and encryption subsystem. You need to let the encryption subsystem somehow identify the user and match the user to the key.

Keys don’t live forever

Every key management standard (and many regulations that point to these standards) mandates a certain approach to key rotation. Regardless of the key rotation/rolling policy, three procedures will impact general application architecture:

  • Re-encrypting the data. Some of the key layouts and key rolling/rotation schemes will require an encryption subsystem to re-encrypt the actual data occasionally, replacing one set of encrypted blobs with another. It is possible to do this without a service shutdown or even disruption in per-record availability with the right set of tools and correct key layout, and aside from security concerns, the choice is typically between:
    • Re-encrypting gradually in the background
    • choosing “key rotation period” and expecting lower SLA compliance or provisioning more resources for these periods
    • triggering re-encryption with some of the queries to the data to distribute it randomly with LRU priority
  • Rotating the keys. It’s possible to rotate encryption keys without data re-encryption sometimes (think about a keyring). It requires redundant storage and has performance penalties associated with every key operation, as well as strong consistency requirements. It also requires a separate process to do background housekeeping all the time, and on-demand when keys are revoked.
  • Key revocation. Some of the key revocation procedures disrupt service availability; some of them (especially in a distributed system) are quite hard to implement synchronously. So various “Key revocation lists,” or “key verification services” come to the rescue. Depending on revocation/verification design, the tradeoff is torn between another blocking call for a remote service or the need to propagate a list of keys closer to service, as well as setting up a local “source of truth.”

Dealing with encryption tradeoffs

Encryption essentially sits on a choke point between protected data and usable data. So, no matter how you arrange your encryption technically, you want it to have sufficient resources and sufficient resource pools.

While application-level encryption requires many independent and interdependent processes, they can be gathered in a few groups from an application architecture perspective:

  • Direct impact: they impact your SLOs in read/write penalties.
  • Maintenance: they impact ops teams with maintenance complexity and new procedures/processes/limitations on existing processes.
  • Housekeeping: they impact system design, developer experience, and SLOs with background and on-demand key lifecycle processes.

The more fine-grained integration with the application is, the more manageable all three classes of issues become. Yet, as with any tight coupling, changes will become much harder.

When you understand performance requirements and SLO impact of your encryption subsystem, there are various ways to ensure that trade-offs are balanced in the right way for your product:

  • Read/write decoupling. Some cryptosystems (and security products) enable decoupling of read and write keys, as well as reads and writes. This always comes with security implications and performance penalties (because additional math gets involved), but it’s a way to go for “write-once, read-often” systems.
  • Consistency. Since encryption adds a few more steps between wanting to write the data and actually writing it, that is why a clear understanding of trajectory and safe failure modes are required.
  • Multi-key setups. If the system is expected to change arbitrary keys frequently, use multiple key-encryption keys (KEK) with access settings “valid for reading” or “valid for writing.” Key-encryption keys protect data encryption keys (DEK). Such a multi-layered scheme makes it possible to run key rolling without service disruption.
  • Batching. Batch the encryption process (using encryption API service to encrypt several records at the same time instead of per-record encryption) allows multiple objects to be encrypted/decrypted together to cut initiation/network request/context switch/data transfer costs.
  • Typical performance improvements: Many performance improvements you would use on high-load systems are valid here—provisioning enough processor cores, setting up efficient multi-threading, providing enough RAM, and ensuring that there are no invisible bottlenecks in data flow are first-level problems that make encryption’s performance hit especially costly.

Building software around encryption

Encryption + key management is just the bare minimum the application needs to actually encrypt something meaningfully. But…with encryption comes a significant limitation of usability, and to be efficient, encryption requires to be tied with other security controls.

Searching the data

The goal of application-level encryption is to hide plaintext values from the datastore. But when you put the data in a database, you expect to be able to search over it.

Searching over encrypted text is possible in a limited fashion, but it will put even more pressure on performance, storage capacity, shared state, and synchronous locking between different components.

What is problematic—each of the production-acceptable ways to implement searchable encryption imposes new counterintuitive security implications, so implementing these internally might lead to delayed security disasters.

Several available schemes include (in the historical order of appearance):

Each of the methods above has its issues—some are already known to have security weaknesses, some are expected to have them by design, some impose unreasonable penalties on your system, and some are academic developments that are years, if not decades, away from being ready for industrial usage.

Managing technical access

A cryptographic system is as good as the security of the keys to all the data. Various key management techniques lower the chances of “one key to rule them all” and provide meaningful ways to manage attack surface on both keys and data. However, it is still a good idea to look into ways of ensuring that keys are safe.

Out-of-system secrets: if your architecture and product model permits end-to-end coverage of data flow with encryption, you might need to store the keys on the client. In this case, it makes sense to properly encrypt them (with password-derived or random key), which keeps the riskiest asset out of the system and significantly decreases the attack surface in general.

KMS: storing the master keys or most of the keys in a separate dedicated computational unit—HSM, key vault, or cloud KMS enables you to control purpose-limited devices for access to most risky assets.

Key stores: if you store the keys in a regular KV store, managing access is important—keys can be tampered with in tricky ways both in storage and when delivered back to your application layer; key access patterns can be observed along with database access patterns to infer some knowledge about the data content.

Segregating encrypting/decrypting nodes: in terms of continuous trust, end-to-end encryption is most desirable, but it isn’t always feasible in terms of limitations it applies. Having numerous components that need plaintext access or need to update the datastore, providing each of them with an SDK that can fetch keys from the datastore is risky—because each of these components, when compromised, will enable attackers to seize the keys. In this case, having single-purpose containers/VMs/appliances that are responsible for encrypting/decrypting is beneficial, and also enables better load management. Think about the separation of duties, separation of trust.

Two-person rule: an old military approach implemented in HashiCorp Vault (Sealed/Unsealed state) making several system administrators provide their secrets for the system to compute the master key and then unlock the key store. This method requires actual human beings to be present on-call to reload the service. See Shamir’s Secret Sharing schemes.

Managing utilities/libraries

Many of the under-the-hood processes (like periodic key rotation) have to be implemented as stand-alone blocks of functionality either ingrained in the application or triggered by system administrators.

When planning the implementation, correctly identifying when processes have to happen and under which circumstances ensure that they:

  • will happen and the system will remain sound from a security perspective.
  • will happen acceptably and the system will not degrade other neighboring systems’ NFRs.
  • won’t ruin existing data and key backups (think about the key rotation procedure that re-encrypts the data, generates new keys, and deletes old keys; then at some point, something goes wrong, you have old data backups, but don’t have old keys).

Monitoring and logging

Encryption is, effectively, the ultimate access control and limitation method. But to ensure that safeguards are secure themselves, it is a good idea to monitor nodes (which are part of the encryption subsystem) for anomalies and security-related events.

Aside from feeding audit logs and accumulating technical logs, encryption subsystems can provide forensic evidence during incident response—sophisticated encryption systems are hard to attack without a trace if the cryptography is sound.

Defense in depth

Remember the difference between continuous application-level encryption and TLS between hosts we’ve discussed above? Let’s look at it from a defense-in-depth perspective.

The best security designs and architectures layer defenses one on top of another in a way that allows several security controls to address one risk—even if one of them is neutralized by attackers, the remaining will still block them.

In this case, the crypto component becomes a choke point for sensitive data access. This allows us to build access control policy enforcement points there. Or embed audit logging in a way that is harder to bypass:

  • Access, audit and security logging, and monitoring: keeping logs that are usable for security monitoring and event management, having more clarity on their origin and coverage.
  • Access control integration: encryption is a choke point on access to data. Making an encryption proxy, library, or API server verify permissions with your identity management or access control system is a drastic improvement in coordinated security and eliminating policy gaps. However, it comes at a cost of even tighter coupling between different systems, more blocking, being a single point of failure, and even more requests.
  • Request authentication and filtering: another approach would be to replicate access control policies from the access control system to the encryption choke point—and, being close to data, not only enforce access policies but reduce the number of obviously risky queries like ‘SELECT * FROM application.allusers()’. This idea lies behind SQL firewalls (which work on a different level than WAF).
  • Data leakage prevention: since data is usable only when decrypted, failure to seize the keys would lead attackers to try and funnel all the data through the encryption/decryption component. There are many ways to detect leakage, from crazy resource consumption to planting special encrypted objects that trigger security alarms, but in the context of this article, understanding that your encryption system can be a circuit-breaker is an assumption you might want to keep in mind.
  • Enforcing sanitization and validation: model/domain validations, input sanitization, and other ways to ensure correct input are best made right before the encryption component, they will make encryption a good breakpoint to verify data validity before burning cycles on encrypting it.

Closing thoughts

Application-level encryption isn’t an easy win: while it’s fairly easy to get started, it’s quite a challenge to do it right. The map of potential pitfalls and problematic spots can’t be exhaustive within a short article like this. However, if you are dealing with off-the-shelf data security products, the list will likely cover most of the issues you will face.

Although providing more granular, more encompassing, and better-controlled security (when done right), application-level encryption is no easy undertaking.

But, at least, now you know all the basic parts!

So, how might a good system with app-level encryption look?

  • Sound cryptography.
  • The key management model that reflects reality.
  • Implementation of key management tooling that fits your performance requirements and architecture.
  • Tradeoffs between security and other NFRs have been considered and balanced appropriately to the system’s high-level goals.
  • Integration between the encryption layer and other security tools.
  • Use of encryption/decryption points (“choke points” of data flow) to enforce additional security controls.
  • Careful monitoring and logging.

Conclusion

The world is changing, and in this changing world data security turns from an obscure compliance requirement to a practical, well-studied set of measures to lower the sensitive data related risks and put these risks in balance with other risks the system is facing.

It’s only up to security professionals and software architects to establish a quality dialogue between industries to work out acceptable solutions for typical use-cases—because mindmaps like these are (hopefully) just precursors to a wider discussion about typical patterns we should use. If you would like to get more information on data security, checkout Cossack Labs’ blog.

About the Author

Eugene Pilyankevich is CTO at Cossack Labs, a data security engineering company, where his job includes: defining product strategy, designing internal products and customer solutions, driving R&D efforts, ensuring the steady cycle of forming–storming–norming–performing of a core engineering team. Eugene started as a software developer and ISP infrastructure engineer nearly two decades ago. Being always keen to chase causes for failures he had to deal with daily led to a chain of positions – through security engineer and software/security architect to CTO in telco, banking, and computer security industries.

Rate this Article

Adoption
Style

BT