Search code examples
azure-devopsazure-pipelinesdatabricksazure-databrickscicd

Databricks API Repo error when i try to Create Repo


Im usign this docs to clone a repo text but unfortunaly, i couldnt do it.

Also i need this to run in a pipeline in Azure DevOps but since i cant do it manually, i also cant in Postman, Python or Bash

I did create the git credentials (which is needed for the create repo)

I use as headers: Auth: Bearer (token from databricks) Content Type: App/json (This two i tried without them) X-Databricks-Azure-SP-Management-Token: X-Databricks-Azure-Workspace-Resource-Id:

{ "url": "https://dev.azure.com/{Project}/_git/{Repo}", "provider": "azureDevOpsServices", "path": "/Repos", "sparse_checkout": { "patterns": [ "parent-folder/child-folder" ] } }

And i get this error:

permission_denied missing required permissions view on node with id '0'

or when i change the path to "/Repos/{Folder}" i get and error for the absolute path, it doesnt recognize the Repos as path.

When i do "GET REPOS" it returns empty even if i have repos created

I tried in Python and i get 400 Bad Request

What could be the error?


Solution

  • If the Azure DevOps organization where the remote git repo is in is linked to the same Microsoft Entra ID tenant as the Azure Databricks, you need to check with the following things:

    1. As stated in "Connect to an Azure DevOps repo using Microsoft Entra ID", the service endpoint for Microsoft Entra ID must be accessible from both the private and public subnets of the Databricks workspace.

    2. The user account you used to generated the token from Databricks should have the access (Read permission at least) to the git repo in the Azure DevOps organization.


    If the Azure DevOps organization is not where the remote git repo is in is not in the same Microsoft Entra ID tenant as as the Azure Databricks, you need to check with the following things (also see "Connect to an Azure DevOps repo using a token"):

    1. Sign in Azure DevOps organization with the user account that can access the the git repo, then go to "User settings" > "Personal Access Tokens" to create a PAT that have the "Code (Read)" scope at least.

      enter image description here

    2. In the Azure Databricks workspace, go to "User Settings" > "Linked accounts" and set the following configurations.

      • Git provider: Azure DevOps Services (Personal access token)
      • Git provider username or email: The user account to sign in Azure DevOps organization and create the PAT.
      • Token: The value of PAT created in Azure DevOps organization.

      enter image description here


    EDIT:

    When trying to link a remote git repo to the repo in Databricks workspace with an identity (a user or a service principal), the Databricks Service will automatically detect the Git provider and git credentials set for the identity. If the Git provider and credentials are not found or not set for the identity, you might get the error.

    For a user, as mentioned above, you can logon the web UI of Databricks workspace with the user account and go to "User settings" > "Personal Access Tokens" to set the Git provider and credentials.

    However, for a service principal, it is not possible to set the Git provider and credentials from web UI.

    To set the Git provider and credentials for a service principal, you can reference the steps below.

    Prerequisites in AAD (Microsoft Entra ID tenant):
    1. Create a Service Principal in the AAD if you do not have one.

    2. Ensure you have added the Service Principal into the Databricks workspace so that it has the access to the resources in the workspace.

    3. Open the Service Principal, go to Certificates & secrets > Client secrets tab to create a client secret for the Service Principal if there is not an existing valid client secret. Copy and remember the value of the client secret.

    Prerequisites in Azure DevOps:
    1. Go to Organization Settings > Microsoft Entra, ensure the Organization has connected to the AAD which the Service Principal is in.

    2. Go to Organization Settings > Users, search and add the Service Principal into the Organization. Give Service Principal with the Basic access level so that it has the access to Azure Repos.

    3. Add the Service Principal into a group so that you can manage the permissions of the Service Principal through that group in the Organization.

    4. To access Azure Git Repos from a project, go to Project Settings > Repositories > Security. Search and select the group which the Service Principal is in. Ensure you have at least set the Read permission to Allow for the group.

    Generate the access tokens for the Service Principal:

    Here I use the curl command with POST method in Bash script to generate the access tokens. You also can use the 'az account get-access-token' command.

    1. Generate the Access Token to Databricks.

      #!/bin/bash
      
      tenant_ID="{tenant_ID}"
      client_id="{client_id}"
      client_secret="{client_secret}"
      uri="https://login.microsoftonline.com/$tenant_ID/oauth2/v2.0/token"
      
      access_token_for_databricks=$(curl -X POST -H "Content-Type: application/x-www-form-urlencoded" $uri \
      -d "grant_type=client_credentials&client_id=$client_id&client_secret=$client_secret&scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default" | jq -r '.access_token')
      
    2. Generate the Access Token to Azure DevOps.

      #!/bin/bash
      
      tenant_ID="{tenant_ID}"
      client_id="{client_id}"
      client_secret="{client_secret}"
      uri="https://login.microsoftonline.com/$tenant_ID/oauth2/v2.0/token"
      
      access_token_for_devops=$(curl -X POST -H "Content-Type: application/x-www-form-urlencoded" $uri \
      -d "grant_type=client_credentials&client_id=$client_id&client_secret=$client_secret&scope=499b84ac-1321-427f-aa17-267ca6975798/.default" | jq -r '.access_token')
      
    Note:

    The access tokens have an only 24-hour lifetime as AAD will regularly rotate the tokens. So, you need to refresh the tokens at least once every 24 hours if you want to continue using them.

    Call Databricks REST API:
    1. Call the API "Create a credential entry" to add the access_token_for_devops in the request body (value of personal_access_token) as Git provider and credentials for the Service Principal. Use the access_token_for_databricks as the Bearer authorization token to call the API.

    2. Then you can call the API "Create a repo" to create repo and link to remote git repo. Still use the access_token_for_databricks as the Bearer authorization token to call the API. At this time, the Databricks Service will automatically use the Git provider and credentials set by the first API.


    EDIT_2:

    As mentioned above:

    • If the Azure DevOps organization is not connected to the same AAD, you need to login the organization with a user account that have the access, and generate the PAT with the "Code (Read)" scope at least. Then when call Databricks API "Create a credential entry" using the access_token_for_databricks as the Bearer authorization toke, pass the PAT and the email address of the user account into the request body (values of personal_access_token and git_username) as Git provider and credentials.

    • If the Azure DevOps organization has connected to the AAD where the service principal is in, you can follow the steps in EDIT to set the Git provider and credentials. Using the PAT of a user account as the Git provider and credentials also is ok.

    In addition, from the request body you posted above, I noticed that there are two mistakes:

    • The URL of Git Repository you provided is in the format "https://dev.azure.com/{Project}/_git/{Repo}" that is an incorrect and invalid URL. The correct and valid URL of a Git Repository in Azure DevOps should be "https://dev.azure.com/{OrganizationName}/{ProjectName}/_git/{GitRepoName}".

    • If you want create the repo under /Repos in the Databricks workspace, value of path should be in the format "/Repos/{folder}/{repo-name}". The repo cannot directly under /Repos, you must also set a folder under /Repos and put the repo under the folder.

    Below is a sample of the request body as reference.

    {
      "url": "https://dev.azure.com/myOrg/myProject/_git/myGitRepo",
      "provider": "azureDevOpsServices",
      "path": "/Repos/myFolder/myRepo"
    }