Search code examples
terraformdatabricksaws-databricksterraform-provider-databricks

How to dynamically change variables in a Databricks notebook based on to which environment was it deployed?


I want to move data from S3 bucket to Databricks. On both platforms I have separate environments for DEV, QA, and PROD.

I use a Databricks notebook which I deploy to Databricks using terraform.

Within the notebook there are some hardcoded variables, pointing at the specific AWS account and bucket.

I want to dynamically change those variables based on to which Databricks environment I deploy the notebook.

It probably can be achieved with Databricks secrets, but I'd rather not use Databricks CLI. Are there other options?

Does terraform provide control over specific code cells within a notebook?


Solution

  • I ended up using cluster's environment variables.

    resource "databricks_job" "my_job" {
      # (...)
      new_cluster {
        # (...)
        spark_env_vars = {
          "ENVIRONMENT" : var.environment
        }
      }
    
      notebook_task {
        notebook_path = databricks_notebook.my_notebook.path
      }
    }
    

    Then in the notebook I hardcoded the constants in a dictionary and I select them by the cluster's environment variable:

    from os import environ
    db_env = environ["ENVIRONMENT"]
    
    aws_account_ids = {
        "dev": 123,
        "qa": 456,
        "prod": 789,
    }
    
    aws_account_id = aws_account_ids[db_env]