Search code examples
azure-devopsyamldatabricksazure-databrickscicd

CICD for Azure Databricks


I am trying to push the Databricks notebooks from Azure Devops to Databricks workspace using a Devops pipeline. Below is the code I am using.

variables:
  
  databricksWorkspaceUrl: ''
  databricksPAT: 456t7788


trigger:
- development

pool:
  vmImage: 'ubuntu-latest'
  
steps:
- task: UsePythonVersion@0
  inputs:
    versionSpec: '3.x'
    addToPath: true

- checkout: self

- script: |
    echo "Starting Databricks notebook upload..."
    # Install Databricks CLI
    pip install databricks-cli

    # Authenticate with Databricks using the PAT
    echo "Authenticating with Databricks..."
    databricks configure --token
    echo "$(databricksPAT)" | databricks tokens create --comment "Databricks PAT" --output json

    # Upload notebooks to Databricks workspace
    echo "Uploading notebooks to Databricks..."
    databricks workspace import $(Build.SourcesDirectory)/notebookpathindevops
    databricks workspace import $(Build.SourcesDirectory)/notebookpathindevops
    echo "Notebooks uploaded successfully."

    # Trigger Databricks pipeline
    echo "Triggering Databricks pipeline..."
    databricks runs submit --json '{
      "run_name": "My Notebook Run",
      "new_cluster": {
        "spark_version": "7.3.x"
      },
      "notebook_task": {
        "notebook_path": "/Folderpathinworkspace"
      }
    }' --url $(databricksWorkspaceUrl)
    echo "Databricks pipeline triggered."
  displayName: 'Upload Notebooks to Databricks and Trigger Databricks Pipeline'
  env:
    databricksPAT: $(databricksPAT)  # Use the variable defined in Azure DevOps

The pipeline is getting stuck at the authenticate with Databricks using the PAT step. Is there anything more I need to add to my code.

Thank you


Solution

  • Normally, when executing the databricks configure command to create a configuration profile, it will prompt you to enter your Azure Databricks PAT. See "Azure Databricks personal access token authentication".

    When running in pipeline, since it is non-interactive, the session will hang on waiting for you to manually enter the PAT until get time-out.

    To resolve this issue, you can try below lines in your script.

    databricks configure --token <<EOF
    $databricksPAT
    EOF