Im usign this docs to clone a repo text but unfortunaly, i couldnt do it.
Also i need this to run in a pipeline in Azure DevOps but since i cant do it manually, i also cant in Postman, Python or Bash
I did create the git credentials (which is needed for the create repo)
I use as headers: Auth: Bearer (token from databricks) Content Type: App/json (This two i tried without them) X-Databricks-Azure-SP-Management-Token: X-Databricks-Azure-Workspace-Resource-Id:
{ "url": "https://dev.azure.com/{Project}/_git/{Repo}", "provider": "azureDevOpsServices", "path": "/Repos", "sparse_checkout": { "patterns": [ "parent-folder/child-folder" ] } }
And i get this error:
permission_denied missing required permissions view on node with id '0'
or when i change the path to "/Repos/{Folder}" i get and error for the absolute path, it doesnt recognize the Repos as path.
When i do "GET REPOS" it returns empty even if i have repos created
I tried in Python and i get 400 Bad Request
What could be the error?
If the Azure DevOps organization where the remote git repo is in is linked to the same Microsoft Entra ID tenant as the Azure Databricks, you need to check with the following things:
As stated in "Connect to an Azure DevOps repo using Microsoft Entra ID", the service endpoint for Microsoft Entra ID must be accessible from both the private and public subnets of the Databricks workspace.
The user account you used to generated the token from Databricks should have the access (Read permission at least) to the git repo in the Azure DevOps organization.
If the Azure DevOps organization is not where the remote git repo is in is not in the same Microsoft Entra ID tenant as as the Azure Databricks, you need to check with the following things (also see "Connect to an Azure DevOps repo using a token"):
Sign in Azure DevOps organization with the user account that can access the the git repo, then go to "User settings" > "Personal Access Tokens" to create a PAT that have the "Code (Read)
" scope at least.
In the Azure Databricks workspace, go to "User Settings" > "Linked accounts" and set the following configurations.
Azure DevOps Services (Personal access token)
EDIT:
When trying to link a remote git repo to the repo in Databricks workspace with an identity (a user or a service principal), the Databricks Service will automatically detect the Git provider and git credentials set for the identity. If the Git provider and credentials are not found or not set for the identity, you might get the error.
For a user, as mentioned above, you can logon the web UI of Databricks workspace with the user account and go to "User settings" > "Personal Access Tokens" to set the Git provider and credentials.
However, for a service principal, it is not possible to set the Git provider and credentials from web UI.
To set the Git provider and credentials for a service principal, you can reference the steps below.
Create a Service Principal in the AAD if you do not have one.
Ensure you have added the Service Principal into the Databricks workspace so that it has the access to the resources in the workspace.
Open the Service Principal, go to Certificates & secrets > Client secrets tab to create a client secret for the Service Principal if there is not an existing valid client secret. Copy and remember the value of the client secret.
Go to Organization Settings > Microsoft Entra, ensure the Organization has connected to the AAD which the Service Principal is in.
Go to Organization Settings > Users, search and add the Service Principal into the Organization. Give Service Principal with the Basic access level so that it has the access to Azure Repos.
Add the Service Principal into a group so that you can manage the permissions of the Service Principal through that group in the Organization.
To access Azure Git Repos from a project, go to Project Settings > Repositories > Security. Search and select the group which the Service Principal is in. Ensure you have at least set the Read
permission to Allow
for the group.
Here I use the curl
command with POST
method in Bash script to generate the access tokens. You also can use the 'az account get-access-token
' command.
Generate the Access Token to Databricks.
#!/bin/bash
tenant_ID="{tenant_ID}"
client_id="{client_id}"
client_secret="{client_secret}"
uri="https://login.microsoftonline.com/$tenant_ID/oauth2/v2.0/token"
access_token_for_databricks=$(curl -X POST -H "Content-Type: application/x-www-form-urlencoded" $uri \
-d "grant_type=client_credentials&client_id=$client_id&client_secret=$client_secret&scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default" | jq -r '.access_token')
Generate the Access Token to Azure DevOps.
#!/bin/bash
tenant_ID="{tenant_ID}"
client_id="{client_id}"
client_secret="{client_secret}"
uri="https://login.microsoftonline.com/$tenant_ID/oauth2/v2.0/token"
access_token_for_devops=$(curl -X POST -H "Content-Type: application/x-www-form-urlencoded" $uri \
-d "grant_type=client_credentials&client_id=$client_id&client_secret=$client_secret&scope=499b84ac-1321-427f-aa17-267ca6975798/.default" | jq -r '.access_token')
The access tokens have an only 24-hour lifetime as AAD will regularly rotate the tokens. So, you need to refresh the tokens at least once every 24 hours if you want to continue using them.
Call the API "Create a credential entry" to add the access_token_for_devops
in the request body (value of personal_access_token
) as Git provider and credentials for the Service Principal. Use the access_token_for_databricks
as the Bearer authorization token to call the API.
Then you can call the API "Create a repo" to create repo and link to remote git repo. Still use the access_token_for_databricks
as the Bearer authorization token to call the API. At this time, the Databricks Service will automatically use the Git provider and credentials set by the first API.
EDIT_2:
As mentioned above:
If the Azure DevOps organization is not connected to the same AAD, you need to login the organization with a user account that have the access, and generate the PAT with the "Code (Read)
" scope at least. Then when call Databricks API "Create a credential entry
" using the access_token_for_databricks
as the Bearer authorization toke, pass the PAT and the email address of the user account into the request body (values of personal_access_token
and git_username
) as Git provider and credentials.
If the Azure DevOps organization has connected to the AAD where the service principal is in, you can follow the steps in EDIT to set the Git provider and credentials. Using the PAT of a user account as the Git provider and credentials also is ok.
In addition, from the request body you posted above, I noticed that there are two mistakes:
The URL of Git Repository you provided is in the format "https://dev.azure.com/{Project}/_git/{Repo}
" that is an incorrect and invalid URL. The correct and valid URL of a Git Repository in Azure DevOps should be "https://dev.azure.com/{OrganizationName}/{ProjectName}/_git/{GitRepoName}
".
If you want create the repo under /Repos
in the Databricks workspace, value of path
should be in the format "/Repos/{folder}/{repo-name}
". The repo cannot directly under /Repos
, you must also set a folder under /Repos
and put the repo under the folder.
Below is a sample of the request body as reference.
{
"url": "https://dev.azure.com/myOrg/myProject/_git/myGitRepo",
"provider": "azureDevOpsServices",
"path": "/Repos/myFolder/myRepo"
}