Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google's BigQuery Introduces Column-Level Encryption Functions and Dynamic Masking of Information

Google's BigQuery Introduces Column-Level Encryption Functions and Dynamic Masking of Information

Google recently released new features for its SaaS data warehouse BigQuery which include column level encryption functions and dynamic masking of information. These features add a second layer of defense on top of access control to help secure and manage sensitive data.

Specifically, dynamic masking of information can be used for real-time transactions whereas  column level encryption provides additional security for data at rest or in motion where real-time usability is not required.

These new features could be useful for companies that store personally identifiable information (PII) and other sensitive data such as credit-card data and biometric information. Companies that store and analyze data in countries where data regulation and privacy mandates are evolving, face ongoing risks from data breaches and data leakage and need to control data access, and these companies may also benefit from the new features.

Column-level encryption enables the encryption and decryption of information at column level, which means that the administrator can select which column is encrypted and which is not. It supports the AES-GCM (non-deterministic) and AES-SIV (deterministic) encryption algorithms. Functions support AES-SIV to allow for grouping, aggregation, and joins on encrypted data. This new feature enables some new use cases: when data is natively encrypted in BigQuery and must be decrypted when accessed, or where data is externally encrypted, stored in BigQuery, and must then be decrypted when accessed.

Column-level encryption is integrated with Cloud Key Management System (Cloud KMS) to provide the administrator more control, to allow management of the encryption keys in KMS, and to enable on-access secure key retrieval as well as detailed logging. Cloud KMS can be used to generate the KEK (key encryption key) that encrypts the DEK (data encryption key) that encrypts the data in BigQuery columns. Cloud KMS uses IAM (identity and access management) to define roles and permissions. KEK is a symmetric encryption keyset that is stored in Cloud KMS, and referencing an encrypted keyset in BigQuery reduces the risk of key exposure.  

The BigQuery documentation explains:

At query execution time, you provide the Cloud KMS resource path of the KEK and the ciphertext from the wrapped DEK. BigQuery calls Cloud KMS to unwrap the DEK, and then uses that key to decrypt the data in your query. The unwrapped version of the DEK is only stored in memory for the duration of the query, and then destroyed.

In one example of a use case, the ZIP code is the data to be encrypted and a non-deterministic functions decrypt data when it is accessed by using the function in the query that is being run on the table.

From BigQuery documentation

In a second example, the AEAD deterministic function can decrypt data when it is accessed by using the function in the query that is being run on the table and supports aggregation and joins using the encrypted data.

From BigQuery documentation

In this way even a user who is not allowed to access the encrypted data can perform a join.

Before the release of column level encryption feature, the administrators need to make copies of the datasets with data obfuscated in order to manage the right accesses to groups. This creates an inconsistent approach to protecting data, which can be expensive to manage. Column level encryption increases the security level because each column can have its own encryption key instead of a single key for the entire database. Using column level encryption allows faster data access because there’s less encryption data.

Dynamic masking of information, released in preview, allows more control to administrators who can choose, combined with the column-level access control, to grant full access, no access to data or masked data extending the column-level security. This capability selectively masks column-level data at query time based on the defined masking rules, user roles and privileges. This feature allows the administrators to obfuscate sensitive data and control user access while mitigating the risk of data leakage.

Thanks to this new feature, sharing data is easier, because the administrators can hide information selectively and the tables can be shared with large groups of users. At application level, the developers don’t need to modify the query to hide sensitive data, after the data masking is configured at BigQuery level, the existing query automatically hides the data based on the roles the user is granted. Last but not least, the application of security is more easy, because the administrator can write the security rule once and then apply it to any number of columns with tags.

Any masking policies or encryption applied on the base tables are carried over to authorized views and materialized views, and masking or encryption is compatible with other security features such as row-level security.

Both new features can be used to increase security, manage access control, comply with privacy law, and create safe test environments. Allow a more consistent way to manage tables with sensitive data, the administrators don’t need to create multiple datasets with encrypted (or not) data and share these copies with right users.

About the Author

Rate this Article