Related question: Terraform Databricks AWS instance profile - "authentication is not configured for provider"
After resolving the error in that question and proceeding, I have started encountering the following error on multiple different operations (create databricks instance profile, query terraform databricks data sources like databricks_current_user
or databricks_spark_version
) etc:
Error: cannot create instance profile: Databricks API (/api/2.0/instance-profiles/add) requires you to set `host` property (or DATABRICKS_HOST env variable) to result of `databricks_mws_workspaces.this.workspace_url`. This error may happen if you're using provider in both normal and multiworkspace mode. Please refactor your code into different modules. Runnable example that we use for integration testing can be found in this repository at https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/guides/aws-workspace
I am able to create an instance profile manually in the Databricks workspace admin console and am able to create clusters and run notebooks in it.
Relevant code:
main.tf:
module "create-workspace" {
source = "./modules/create-workspace"
env = var.env
region = var.region
databricks_host = var.databricks_host
databricks_account_username = var.databricks_account_username
databricks_account_password = var.databricks_account_password
databricks_account_id = var.databricks_account_id
}
providers-main.tf:
terraform {
required_version = ">= 1.1.0"
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = "0.4.4"
}
aws = {
source = "hashicorp/aws"
version = ">= 3.49.0"
}
}
}
provider "aws" {
region = var.region
profile = var.aws_profile
}
provider "databricks" {
host = var.databricks_host
token = var.databricks_manually_created_workspace_token
}
modules/create-workspace/providers.tf:
terraform {
required_version = ">= 1.1.0"
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = "0.4.4"
}
aws = {
source = "hashicorp/aws"
version = ">= 3.49.0"
}
}
}
provider "aws" {
region = var.region
profile = var.aws_profile
}
provider "databricks" {
host = var.databricks_host
# token = var.databricks_manually_created_workspace_token - doesn't make a difference switching from username/password to token
username = var.databricks_account_username
password = var.databricks_account_password
account_id = var.databricks_account_id
}
provider "databricks" {
alias = "mws"
# host =
username = var.databricks_account_username
password = var.databricks_account_password
account_id = var.databricks_account_id
}
modules/create-workspace/databricks-workspace.tf:
resource "databricks_mws_credentials" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
role_arn = aws_iam_role.cross_account_role.arn
credentials_name = "${local.prefix}-creds"
depends_on = [aws_iam_role_policy.this]
}
resource "databricks_mws_workspaces" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
aws_region = var.region
workspace_name = local.prefix
deployment_name = local.prefix
credentials_id = databricks_mws_credentials.this.credentials_id
storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
network_id = databricks_mws_networks.this.network_id
}
modules/create-workspace/IAM.tf:
data "databricks_aws_assume_role_policy" "this" {
external_id = var.databricks_account_id
}
resource "aws_iam_role" "cross_account_role" {
name = "${local.prefix}-crossaccount"
assume_role_policy = data.databricks_aws_assume_role_policy.this.json
}
resource "time_sleep" "wait" {
depends_on = [
aws_iam_role.cross_account_role]
create_duration = "10s"
}
data "databricks_aws_crossaccount_policy" "this" {}
resource "aws_iam_role_policy" "this" {
name = "${local.prefix}-policy"
role = aws_iam_role.cross_account_role.id
policy = data.databricks_aws_crossaccount_policy.this.json
}
data "aws_iam_policy_document" "pass_role_for_s3_access" {
statement {
effect = "Allow"
actions = ["iam:PassRole"]
resources = [aws_iam_role.cross_account_role.arn]
}
}
resource "aws_iam_policy" "pass_role_for_s3_access" {
name = "databricks-shared-pass-role-for-s3-access"
path = "/"
policy = data.aws_iam_policy_document.pass_role_for_s3_access.json
}
resource "aws_iam_role_policy_attachment" "cross_account" {
policy_arn = aws_iam_policy.pass_role_for_s3_access.arn
role = aws_iam_role.cross_account_role.name
}
resource "aws_iam_instance_profile" "shared" {
name = "databricks-shared-instance-profile"
role = aws_iam_role.cross_account_role.name
}
resource "databricks_instance_profile" "shared" {
instance_profile_arn = aws_iam_instance_profile.shared.arn
depends_on = [databricks_mws_workspaces.this]
}
In this case, the problem is that you need to have two Databricks providers:
One of these providers needs to be declared with alias so Terraform can distinguish one from another. Documentation for Databricks provider shows how to do that. But the problem is that Terraform tries to apply all changes in parallel as much as possible, because it doesn't know about dependencies between resources, until you explicitly use depends_on
, and tries to create Databricks resources before it knows about host value for Databricks workspace (even if it's already created).
Unfortunately, it's not possible to put depends_on
into the provider block. So current recommendation to avoid such problem is to to split code into several modules:
Also, Terraform doc recommends that initialization of providers didn't happen in modules - it's better to declare all providers with aliases inside the top-level template, and pass providers to modules explicitly (see example below). In this case module should have only declaration of required modules, but not their configuration.
For example, top-level template could look like this:
terraform {
required_version = ">= 1.1.0"
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = "0.4.5"
}
}
}
provider "databricks" {
host = var.databricks_host
token = var.token
}
provider "databricks" {
alias = "mws"
host = "https://accounts.cloud.databricks.com"
username = var.databricks_account_username
password = var.databricks_account_password
account_id = var.databricks_account_id
}
module "workspace" {
source = "./workspace"
providers = {
databricks = databricks.workspace
}}
module "databricks" {
depends_on = [ module.workspace ]
source = "./databricks"
# No provider block required as we're using default provider
}
and module itself like this:
terraform {
required_version = ">= 1.1.0"
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = ">= 0.4.4"
}
}
}
resource "databricks_cluster" {
...
}