Search code examples
pythongoogle-cloud-platformencryptiongoogle-bigquerygoogle-cloud-functions

Updating BigQuery Dataset Encryption Configuration in google-cloud-bigquery Python Client


I am attempting to update the Key Management Service (KMS) key associated with the encryption configuration of an existing Google BigQuery dataset using the google-cloud-bigquery Python client library, inside Google Cloud Function.

Below is the configuration of the cloud function. For some reason I can't get the environment variable inside the cloud function with os.getenv or os.environ.get. Is there a way to get the project name as a string inside the code without hardcoding it?

1st gen
256MB
Env vars:
    PROJECT_ID: "project_id"

I have tried the following approach based on the official documentation, various examples and ChatGPT suggestions:

from google.cloud import bigquery
import os

def update_kms_key(request):
    try:
        project_id = os.environ.get("PROJECT_ID", "your-project-id")
        key_ring_id = "your-key-ring-id"
        key_id = "your-key-id"
        location = "your-location"

        kms_key_name = f"projects/{project_id}/locations/{location}/keyRings/{key_ring_id}/cryptoKeys/{key_id}"

        client = bigquery.Client(project=project_id)
        datasets = list(client.list_datasets())

        if datasets:
            dataset_id = datasets[0].dataset_id
            dataset_ref = client.dataset(dataset_id)

            dataset = client.get_dataset(dataset_ref)

            if not hasattr(dataset, 'encryption_configuration'):
                dataset.encryption_configuration = bigquery.EncryptionConfiguration()

            dataset.encryption_configuration.kms_key_name = kms_key_name

            dataset = client.update_dataset(dataset, ['encryption_configuration'])

            return {"Success": [dataset.dataset_id]}
        else:
            return {"Error": "No datasets found."}

    except Exception as e:
        return {"ERROR": str(e)}

However, I am encountering an error: "No property 'encryption_configuration'". It appears that the encryption_configuration property is not present in the Dataset object.

Is there a correct way to update the KMS key associated with the encryption configuration of an existing BigQuery dataset using the google-cloud-bigquery Python client library? If not, what alternative approaches or workarounds can be used to achieve this?


Solution

  • I managed to make it work. Here is the final code for reference:

    import logging
    from os import getenv
    
    from google.cloud.bigquery import (
        Client,
        EncryptionConfiguration
    )
    
    
    logging.getLogger().setLevel(logging.INFO)
    
    client = Client()
    
    PROJECT_ID = client.project
    TEMP_KMS_KEY = getenv('TEMP_KMS_KEY')
    ORIGINAL_KMS_KEY = getenv('ORIGINAL_KMS_KEY')
    
    def rotate_kms(request) -> str:
        try:
            datasets = client.list_datasets()
    
            for dataset in datasets:
                tables = client.list_tables(dataset=dataset)
    
                for table in tables:
                    id = f'{PROJECT_ID}.{dataset.dataset_id}.{table.table_id}'
                    table = client.get_table(id)
    
                    if table.table_type == 'TABLE':
                        if table.encryption_configuration is not None:
                            if table.encryption_configuration.kms_key_name:
                                logging.info(table)
                                logging.info(f'Updating KMS key of: {table}')
    
                                table.encryption_configuration = EncryptionConfiguration(
                                    kms_key_name=TEMP_KMS_KEY
                                )
    
                                table = client.update_table(table, ['encryption_configuration'])
                                logging.info(f'Temp KMS key set to table: {table}')
    
                                table.encryption_configuration = EncryptionConfiguration(
                                    kms_key_name=ORIGINAL_KMS_KEY
                                )
    
                                table = client.update_table(table, ['encryption_configuration'])
                                logging.info(f'Original KMS key set to table: {table}')
                            else:
                                logging.warning(f'{table} has default encryption and requires a copy job to change from default encryption. Skipping...')
                    else:
                        logging.warning(f'{table} is a view. Skipping...')
    
            logging.info('Cloud Function executed successfully!')
            return 'Execution Successful!'
        
        except Exception as e:
            logging.error(e)
            return 'Cloud Function failed!'
    

    Changes from the original KMS key encryption to the temporary one, and back to update the BigQuery tables KMS encryption, to the new version of the original key, after rotating, as described here: https://cloud.google.com/bigquery/docs/customer-managed-encryption

    Impact of Cloud KMS key rotation

    BigQuery doesn't automatically rotate a table encryption key when the Cloud KMS key associated with the table is rotated. All data in the existing tables continue to be protected by the key version with which they were created.

    Any newly-created tables use the primary key version at the time of their creation.

    To update a table to use the most recent key version, change the table to a different Cloud KMS key and then back to the original.