Search code examples
azureazure-devopsdevopsdatabricksazure-databricks

Clone/Copy repo from AzureDevOps to Databricks


I need to create or copy the repo from Azure DevOps to Azure Databricks using a service principal (Service Connection) without the need to ask for the secret value since im not allow (security purposes)

Currently im not using python whl, or databricks bundle so the documentation provided by Microsoft doesnt help that much. https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops

I have the CI that wraps all the files in artifact (.zip), all i need is to clone or copy that in /Repos in databricks in the name of the service principal, which im not getting.

Any idea?

I did try the documentation from Microsoft, and Databricks but i dont need the workspace copy, i need to copy on the Repos only


Solution

  • You can use the Databricks CLI in Azure DevOps pipeline to clone the repo to Databricks.

    Steps:

    1. Create a personal access token in azure devops with code read permission. We will use it to provide git credentials in the pipeline. enter image description here

    2. Create a pipeline with the following yaml file. The first Azure CLI task is used to save the service principal from the service connection and then we can use it in the second task.

    trigger:
    - none
    
    pool:
      vmImage: ubuntu-latest
    
    steps:
    - task: AzureCLI@2
      inputs:
        azureSubscription: 'service connection name'
        scriptType: 'bash'
        scriptLocation: 'inlineScript'
        inlineScript: |
          echo "##vso[task.setvariable variable=ARM_CLIENT_SECRET;]$servicePrincipalKey" 
          echo "##vso[task.setvariable variable=ARM_CLIENT_ID;]$servicePrincipalId" 
          echo "##vso[task.setvariable variable=ARM_TENANT_ID;]$tenantId"
        addSpnToEnvironment: true
    - script: |
          curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
          databricks git-credentials create AzureDevOpsServices --git-username yourname  --personal-access-token $(pat) 
          databricks repos create "https://[email protected]/orgname/projectname/_git/reponame"
      env:
          DATABRICKS_HOST: $(DATABRICKS_HOST)
          ARM_CLIENT_SECRET: $(ARM_CLIENT_SECRET)
          ARM_CLIENT_ID: $(ARM_CLIENT_ID)
          ARM_TENANT_ID: $(ARM_TENANT_ID)
      displayName: 'install databricks cli and Run databricks  repos create'
    
    
    1. Create the variables DATABRICKS_HOST and pat. DATABRICKS_HOST is your URL https://xxxxxxxxxxxx.xx.azuredatabricks.net/. pat is the personal access token you created in step 1.

      enter image description here

    2. Copy the repo URL from Azure DevOps and replace the one in the yaml.

    repo URL

    1. Add your managed identity to your Azure Databricks workspace.
    • You can go to the service connection page and click the Manage Service Principal to go to the Manage Service Principal page.

    enter image description here

    • copy the Display name and Application (client) ID

    enter image description here

    • In your Azure Databricks workspace, click your username in the top bar and click Settings. Click Identity and access and manage Service principals.
      enter image description here
    • Add Service principal, choose Microsoft Entra ID managed, fill in the Display name and Application (client) ID you copied and add. enter image description here
    1. Run the pipeline and the repo is cloned from Azure DevOps to Azure Databricks. enter image description here

    2. The owner of the repo is the service principal.

    enter image description here