Search code examples
azure-devopsazure-databricksazure-devops-rest-apiazure-pipelines-yamldatabricks-workflows

how to trigger a data bricks jobs using YAML pipeline


how can I trigger any data bricks job using Yaml? I have a job1.json file saved in the repo which has jobId like this:

{
  "Job_ID": "12345678910"
}

So far I've this but I'm getting an error running this code:

  
  - name: configu
    value: 'rel'
  - name: Bld1
    value: 'test'
  - name: BHP
    value: '1.1'
  trigger:
    none
  stages:
  - stage: Bld
    jobs:
    - job: Bld
      displayName: AG
      pool:
        vmImage: windows-****
      steps:
      - checkout: self
      - task: CopyFiles@2
        displayName: 'Copy Files to: $(build.artifactstagingdirectory)'
        inputs:
          SourceFolder: $(build.sourcesdirectory)
          Contents: '*\'
          TargetFolder: $(build.artifactstagingdirectory)
      - task: PublishPipelineArtifact@1
        displayName: 'Publish : $(buildParameters.artifactName)'
        inputs:
          path: $(build.artifactstagingdirectory)
          artifactName: $(buildParameters.artifactName)
- stage: Dev
    variables:
      - group: abc
      - group: def
    jobs:
    - job: RunDatabricksJob
      displayName: 'Run Databricks Job'
      pool:
        vmImage: 'windows-****'
      steps:
      - task: AzureCLI@2
        inputs:
          azureSubscription: 'test1'
          scriptType: 'bash'
          scriptLocation: 'inlineScript'
          inlineScript: |
              token=$(az account get-access-token \
              --resource akbfksjfwfkjvjkrbfj \
              --query "accessToken" \
              --output tsv)
              echo "##vso[task.setvariable variable=azureadtoken;isoutput=true]$token"
        name: tokengen
      - task: PythonScript@0
      - script: |
          job_id=$(python -c "import json; print(json.load(open('$(System.DefaultWorkingDirectory)/../$(BuildParameters.ArtifactName)/job1.json'))['job_id'])")
          databricks jobs run-now --job-id $job_id
        env:
          DATABRICKS_HOST: $(hostname)
          DATABRICKS_TOKEN: $(tokengen.azureadtoken)

Could you you please suggest correct methd to trigger job or what's wrong with this code?

I appreciate your help in advance.


Solution

  • So far I've this but I'm getting an error running this code:

    Could you share the error message?

    From looking at your sample code, could you please check the following?

    1 .Please check if there is an uppercase and lowercase spelling error on Job_ID.

    json.load(open('xxx/xxx/job1.json'))['job_id'])
    

    2 Please check if you define a variable job_id in python with $. A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ ).If so ,please remove $ character.

    3 Please check if there is - in front of script.

    sample:

    - task: PythonScript@0
      inputs:
        script: |
          import  json
          job_id =json.load(open('$(System.DefaultWorkingDirectory)/../$(BuildParameters.ArtifactName)/job1.json'))['Job_ID'])
          # print job_id
          print(job_id)
          databricks jobs run-now --job-id job_id
    

    Hope it can help.

    UPDATE

    You can use both Databricks CLI and REST API to trigger a job. Here is an example to run a databirck job with Databricks CLI in YAML pipeline.

    1.Install the CLI

    - task: UsePythonVersion@0
      inputs:
        versionSpec: '3.7'
        addToPath: true
        architecture: 'x64'
    - task: Bash@3
      inputs:
        targetType: 'inline'
        script: 'pip install databricks-cli'
    

    2 Set up authentication using a Microsoft Entra ID (formerly Azure Active Directory) token

    - task: Bash@3
      inputs:
        targetType: 'inline'
        script: |
          echo $token > token-file
          databricks configure --host $url --token-file token-file
    

    Note: You can refer to this doc to get tokens.

    3.use PythonScript@0 task to get job_id

    - task: PythonScript@0
      inputs:
        scriptSource: 'inline'
        script: |
          import  json
          job_id =json.load(open('$(System.DefaultWorkingDirectory)/job1.json'))['job_id']
           print(f'##vso[task.setvariable variable={job_id};]{job_id}')
    

    4 use Databricks CLI to run job

     - task: Bash@3
      inputs:
        targetType: 'inline'
        script: |
         databricks jobs run-now --job-id job-id
    

    Doc referred: https://learn.microsoft.com/en-us/azure/databricks/archive/dev-tools/cli/