azure-devops azure-databricks azure-devops-rest-api azure-pipelines-yaml databricks-workflows

how to trigger a data bricks jobs using YAML pipeline

how can I trigger any data bricks job using Yaml? I have a job1.json file saved in the repo which has jobId like this:

{
  "Job_ID": "12345678910"
}

So far I've this but I'm getting an error running this code:

  
  - name: configu
    value: 'rel'
  - name: Bld1
    value: 'test'
  - name: BHP
    value: '1.1'
  trigger:
    none
  stages:
  - stage: Bld
    jobs:
    - job: Bld
      displayName: AG
      pool:
        vmImage: windows-****
      steps:
      - checkout: self
      - task: CopyFiles@2
        displayName: 'Copy Files to: $(build.artifactstagingdirectory)'
        inputs:
          SourceFolder: $(build.sourcesdirectory)
          Contents: '*\'
          TargetFolder: $(build.artifactstagingdirectory)
      - task: PublishPipelineArtifact@1
        displayName: 'Publish : $(buildParameters.artifactName)'
        inputs:
          path: $(build.artifactstagingdirectory)
          artifactName: $(buildParameters.artifactName)
- stage: Dev
    variables:
      - group: abc
      - group: def
    jobs:
    - job: RunDatabricksJob
      displayName: 'Run Databricks Job'
      pool:
        vmImage: 'windows-****'
      steps:
      - task: AzureCLI@2
        inputs:
          azureSubscription: 'test1'
          scriptType: 'bash'
          scriptLocation: 'inlineScript'
          inlineScript: |
              token=$(az account get-access-token \
              --resource akbfksjfwfkjvjkrbfj \
              --query "accessToken" \
              --output tsv)
              echo "##vso[task.setvariable variable=azureadtoken;isoutput=true]$token"
        name: tokengen
      - task: PythonScript@0
      - script: |
          job_id=$(python -c "import json; print(json.load(open('$(System.DefaultWorkingDirectory)/../$(BuildParameters.ArtifactName)/job1.json'))['job_id'])")
          databricks jobs run-now --job-id $job_id
        env:
          DATABRICKS_HOST: $(hostname)
          DATABRICKS_TOKEN: $(tokengen.azureadtoken)

Could you you please suggest correct methd to trigger job or what's wrong with this code?

I appreciate your help in advance.

Solution

So far I've this but I'm getting an error running this code:

Could you share the error message?

From looking at your sample code, could you please check the following?

1 .Please check if there is an uppercase and lowercase spelling error on Job_ID.

json.load(open('xxx/xxx/job1.json'))['job_id'])

2 Please check if you define a variable job_id in python with $. A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ ).If so ,please remove $ character.

3 Please check if there is - in front of script.

sample:

- task: PythonScript@0
  inputs:
    script: |
      import  json
      job_id =json.load(open('$(System.DefaultWorkingDirectory)/../$(BuildParameters.ArtifactName)/job1.json'))['Job_ID'])
      # print job_id
      print(job_id)
      databricks jobs run-now --job-id job_id

Hope it can help.

UPDATE

You can use both Databricks CLI and REST API to trigger a job. Here is an example to run a databirck job with Databricks CLI in YAML pipeline.

1.Install the CLI

- task: UsePythonVersion@0
  inputs:
    versionSpec: '3.7'
    addToPath: true
    architecture: 'x64'
- task: Bash@3
  inputs:
    targetType: 'inline'
    script: 'pip install databricks-cli'

2 Set up authentication using a Microsoft Entra ID (formerly Azure Active Directory) token

- task: Bash@3
  inputs:
    targetType: 'inline'
    script: |
      echo $token > token-file
      databricks configure --host $url --token-file token-file

Note: You can refer to this doc to get tokens.

3.use PythonScript@0 task to get job_id

- task: PythonScript@0
  inputs:
    scriptSource: 'inline'
    script: |
      import  json
      job_id =json.load(open('$(System.DefaultWorkingDirectory)/job1.json'))['job_id']
       print(f'##vso[task.setvariable variable={job_id};]{job_id}')

4 use Databricks CLI to run job

 - task: Bash@3
  inputs:
    targetType: 'inline'
    script: |
     databricks jobs run-now --job-id job-id

Doc referred: https://learn.microsoft.com/en-us/azure/databricks/archive/dev-tools/cli/