Our team is currently exploring the ways to encrypt PII data on the field level within BigQuery and we found out the following way to encrypt/decrypt using Crypto-JS:
#standardSQL
CREATE TEMPORARY FUNCTION encrypt(_text STRING) RETURNS STRING LANGUAGE js AS
"""
let key = CryptoJS.enc.Utf8.parse("<key>");
let options = { iv: CryptoJS.enc.Utf8.parse("<iv>"), mode: CryptoJS.mode.CBC };
let _encrypt = CryptoJS.AES.encrypt(_text, key, options);
return _encrypt;
""";
CREATE TEMPORARY FUNCTION decrypt(_text STRING) RETURNS STRING LANGUAGE js AS
"""
let key = CryptoJS.enc.Utf8.parse("<key>");
let options = { iv: CryptoJS.enc.Utf8.parse("<iv>"), mode: CryptoJS.mode.CBC };
let _decrypt = CryptoJS.AES.decrypt(_text, key, options).toString(CryptoJS.enc.Utf8);
return _decrypt;
""" OPTIONS (library="gs://path/to/Crypto-JS/crypto-js.js");
-- query to encrypt fields
SELECT
<fields>, encrypt(<pii-fields>)
FROM
`<project>.<dataset>.<table>`
-- query to decrypt fields
SELECT
<fields>, decrypt(<pii-fields>)
FROM
`<project>.<dataset>.<table>`
I am trying to benchmark the performance of AES CBC encryption & decryption using Crypto JS library in the big query before deploying it into our production. We found out the rate of data to encrypt & decrypt is growing exponential per records with increasing number of data compared to the usual query. However with the increasing number of data to process, the progress of processing per record & record processing time is improving.
As there are no available documentation regarding this, could someone from the community help provide better ways, optimize query, best practices to use field level encryption & decryption within the big query?
BigQuery now supports encryption functions. From the documentation, here is a self-contained example that creates some keysets and uses them to encrypt data. In practice, you would want to store the keysets in a real table so that you can later use them to decrypt the ciphertext.
WITH CustomerKeysets AS (
SELECT 1 AS customer_id, KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS keyset UNION ALL
SELECT 2, KEYS.NEW_KEYSET('AEAD_AES_GCM_256') UNION ALL
SELECT 3, KEYS.NEW_KEYSET('AEAD_AES_GCM_256')
), PlaintextCustomerData AS (
SELECT 1 AS customer_id, 'elephant' AS favorite_animal UNION ALL
SELECT 2, 'walrus' UNION ALL
SELECT 3, 'leopard'
)
SELECT
pcd.customer_id,
AEAD.ENCRYPT(
(SELECT keyset
FROM CustomerKeysets AS ck
WHERE ck.customer_id = pcd.customer_id),
pcd.favorite_animal,
CAST(pcd.customer_id AS STRING)
) AS encrypted_animal
FROM PlaintextCustomerData AS pcd;
Edit: if you want to decrypt using AES-CBC with PKCS padding (it's not clear what kind of padding you are using in your example) you can use the KEYS.ADD_KEY_FROM_RAW_BYTES
function to create a keyset, then call AEAD.DECRYPT_STRING
or AEAD.DECRYPT_BYTES
. For example:
SELECT
AEAD.DECRYPT_STRING(
KEYS.ADD_KEY_FROM_RAW_BYTES(b'', 'AES_CBC_PKCS', b'1234567890123456'),
FROM_HEX('deed2a88e73dccaa30a9e6e296f62be27db30db16f76d3f42c85d31db3f46376'),
'')
This returns abcdef
. The IV is expected to be the first 16 bytes of the ciphertext.