Search code examples
terraformdatabricksazure-databricksterraform-provider-azureterraform-provider-databricks

Terraform Databricks plan failed to install provider databrickslabs/databricks - checksum list has no SHA-256 hash for provider


I am using terraform to create a databricks job. The job has been created successfully when I use the provider databrickslabs/databricks version 0.6.2. Because I need to use data source "databricks_job", I decided to upgrade this provider to version 1.5.0, which is the latest version available at databricks job data source. As soon as I upgrade the version from 0.6.2 to 1.5.0 I get the following error. I tried using 1.4.0 but the error stays the same:

Error: Failed to install provider

Error while installing databrickslabs/databricks v1.5.0: checksum list has no SHA-256 hash for "https://github.com/databricks/terraform-provider-databricks/releases/download/v1.5.0/terraform-provider-databricks_1.5.0_linux_amd64.zip"

Setup:

  • I am using Azure DevOps pipelines to run my terraform code.
  • I am using an Azure virtual machine scale set with ubuntu linux os as a self-hosted agent to implement the infrastructure. (May be irrelevant as the same agent pool works properly with databrickslabs version 0.6.2)
  • The DevOps pipeline is executing the terraform code in a docker container with base image python:3.10-slim which has terraform version 1.3.9 installed.
  • Terraform init, plan and apply run from within DevOps pipeline by using scripts. You can find the script source later here.
  • The terraform code structure is as follows:

Terraform code structure in repository

  • I am saving the terraform state into a container in an Azure Storage account. This is set in init.tfvars under dev folder.

The init.tfvars is as follows:

use_msi              = false
subscription_id      = "<azure subscription id>"
resource_group_name  = "<name of azure resource group>" 
storage_account_name = "<name of my azure storage account>"
container_name       = "tfstates"
key                  = "databricks-job.tfstate"

The code for versions.tf is as follows:

terraform {
  required_version = ">= 1.3.0"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.44.1"
    }
    databricks = {
      source  = "databrickslabs/databricks"
      version = "1.5.0" ## works for 0.6.2
    }
  }
}

The DevOps stage to terraform init and plan is as follows:

stages:
- stage: ${{parameters.stageName}}
  displayName: 'Plan Terraform Create Databricks Job'
  dependsOn: ${{parameters.dependsOn}}
  variables:
  - group: ${{parameters.variableGroup}}
  - name: artifactPath
    value: $(Build.BuildId)-prod-terraform
  - name: artifactName
    value: $(Build.BuildId)_prod_plan.tfplan
  jobs:
  - job: create_datbaricks_job_by_tf
    container: ${{parameters.containerName}}
    steps:
    - checkout: self
    - script: |
        pwd
      displayName: Working Directory
    - script: |
        ls
      displayName: folder structure
    - script: |
        export AZDO_PERSONAL_ACCESS_TOKEN=$(System.AccessToken)
        export AZDO_ORG_SERVICE_URL="https://dev.azure.com/MYORGANIZATION/"
        echo https://token:$(System.AccessToken)@dev.azure.com/MYORGANIZATION/MYPROJECT/ > ~/.git-credentials
        git config --global credential.helper 'store --file ~/.git-credentials'
        chmod 600 ~/.git-credentials
        terraform --version
        echo "Present working directory " 
        pwd
        echo "backend-config path:"
        echo "./tfvars/${{parameters.environment}}/init.tfvars"
        echo "################"
        cd deployment-pipelines/tf && terraform init --backend-config="./tfvars/${{parameters.environment}}/init.tfvars"
        echo "terraform providers"
        # terraform providers lock -platform=windows_amd64 -platform=darwin_amd64 -platform=linux_amd64 
        terraform providers
        echo "terraform validate"
        terraform validate
        mkdir $(artifactPath)
        terraform plan -input=false -out="$(artifactPath)/$(artifactName)" -var-file="./tfvars/${{parameters.environment}}/terraform.tfvars"
      displayName: 'Terraform Init, Validate, Plan'
      env:
        ARM_CLIENT_ID: <Service principal application id>
        ARM_CLIENT_SECRET: <Service principal secret > 
        ARM_SUBSCRIPTION_ID: <Azure subscription id> 
        ARM_TENANT_ID: <Azure tenant id>
    - script: cat deployment-pipelines/tf/$(artifactPath)/$(artifactName) 
      displayName: Read tfplan
- publish: deployment-pipelines/tf/$(artifactPath)/$(artifactName)
  displayName: 'Publish Terraform plan file'
  artifact: $(artifactName)

Just in case you want to know what is in main.tf, it is as follows:

resource "databricks_job" "sample-tf-job" {
  name = var.job_name
  task{
    task_key = "a"
    existing_cluster_id = "<databricks-cluster>"
    python_wheel_task {
      package_name = "myWheelOackage"
      entry_point = "__init__.py"
    }
    library {
      whl = "dbfs:/tmp/myWheel-1.0.0-py3-none-any.whl"
    } 
  }
}

As I mentioned this code works well when 1.5.0 is replaced with 0.6.2.

Here you can see the successful run:

Successfult plan

Here is the error in the pipeline when databricks version is changed to 1.5.0: Terraform init fails

Solutions I tried but didn't work:

  1. I downgraded the "databrickslabs/databricks" version to 1.4.0, still same error.

  2. I deleted the terraform state file in Azure Storage container, but still same error.

  3. I used the script mentioned in this link and added the following line to the terraform script above but it raised the same error:

    terraform providers lock -platform=windows_amd64 -platform=darwin_amd64 -platform=linux_amd64


Solution

  • Databricks Terraform provider has switched from databrickslabs to databricks last year, when it reached the GA (announcement blog post). And you need to update your code to use newer versions - see instructions in the troubleshooting guide on how to do it.

    And it's better to upgrade to the latest versions - it's already 1.11.x.

    P.S. While Terraform registry provided a redirect from databrickslabs to databricks, it became broken in some of the intermediate versions, so you can pull only up to 1.4.x or something like that.