Search code examples
python-3.xazurepysparksharepointdatabricks

Downloading files from Sharepoint to File System using Databricks


We have the following python code already working like a champ in one of our environments:

def download_files_to_dbfs (site_url, file_url, dbfs_path):
  try:
    # Credentials for accessing SharePoint
        credentials = ClientCredential("Client-ID?",
                                       "Client-Secret?")
        ctx = ClientContext(site_url).with_credentials(credentials)

        # File_url is the sharepoint url from which you need the list of files
        list_source = ctx.web.get_folder_by_server_relative_url(file_url)
        files = list_source.files
        ctx.load(files)
        ctx.execute_query()

        for myfiles in files:
            rel_url = myfiles.properties["ServerRelativeUrl"]
            download_path = dbfs_path + myfiles.properties["Name"]

            with open(download_path, "wb") as local_file:
                 file = ctx.web.get_file_by_server_relative_path(rel_url).download(local_file).execute_query()
                 print("Downloaded file " + myfiles.properties["Name"])

  except Exception as e:
    print(e)

We have been granted our own workspace, where we are essentially the administrators. The issue is that even though our SharePoint admin has already set up a client secret connection to SharePoint, we still can't access SharePoint using the same function by switching to our new Client-Secret and Client-ID. In the code above, I placed "Client-ID?" and "Client-Secret?" within quotation marks because I'm not sure about the correct order or whether these two are the required elements for establishing the connection between SharePoint and Databricks. The error we encounter is Acquire app-only access token failed, with no additional details provided, only the error message itself. We have been struggling for weeks now :(...


Solution

  • After @Evandro de Paula kindly advised me on how to proceed, I have finally concluded that this was a firewall issue. How? I added the traceback to my function to detect further details of the error... I hope this helps someone else too.