Search code examples
pythonazuremicrosoft-graph-apiazure-databricks

Using the Microsoft Graph API to load files from Sharepoint into Databricks


I want to load files from Sharepoint using a Microsoft App registration and the Microsoft Graph API. Right now I get the error that I am not authotized but I do not know if I just do not have the necessary permissions or if I did something wrong about the path of the file I want to load. I have the Sites.Read.All and Sites.Select permissions. Can anyone help me? My code looks like this:

sharepoint_site = "company_name.sharepoint.com/sites/site_name"
sharepoint_file_path = "relative_path/file.xlsx"
client_id = dbutils.secrets.get("secret_scope", "client_id")
tenant_id = dbutils.secrets.get("secret_scope", "tenant_id")
client_secret = dbutils.secrets.get("secret_scope", "client_secret")

auth_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
auth_data = {
    "grant_type": "client_credentials",
    "client_id": client_id,
    "client_secret": client_secret,
    "scope": "https://graph.microsoft.com/.default"
}

auth_response = requests.post(auth_url, data=auth_data)
if auth_response.status_code != 200:
    print("Fehler bei der Authentifizierung")
    print(auth_response.text)
    exit()

access_token = auth_response.json().get("access_token")

headers = {"Authorization": f"Bearer {access_token}"}
download_url = f"https://graph.microsoft.com/v1.0/sites/{sharepoint_site}/drive/root:/{sharepoint_file_path}:/content"
file_response = requests.get(download_url, headers=headers)
if file_response.status_code == 200:
    file_path = "/dbfs/path.xlsx"
    with open(file_path, "wb") as f:
        f.write(file_response.content)
    print("Datei heruntergeladen!")
else:
    print(f"Fehler: {file_response.status_code}")
    print(file_response.text)

Solution

  • Grant Files.Read.All api permission to the application with Application permission type.

    enter image description here

    After getting access token make sure you are having roles, you check this in jwt.io by pasting the token.

    enter image description here

    Next, you get the site id using below code.

    headers = {"Authorization": f"Bearer {access_token}"}
    
    site_res = requests.get("https://graph.microsoft.com/v1.0/sites/root:/sites/<site_name>",headers=headers)
    site_id = site_res.json()['id']
    

    Then you make request to get the file with below code.

    download_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/root:/Identity AR.xlsx:/content"
    
    file_response = requests.get(download_url, headers=headers)
    print(file_response.status_code)
    if file_response.status_code == 200:
        file_path = "/dbfs/identity.xlsx"
        with open(file_path, "wb") as f:
            f.write(file_response.content)
        print(f"file created")
    else:
        print(file_response.status_code)
    

    enter image description here