Search code examples
pythonexcelpandassharepoint

How to read SharePoint Online (Office365) Excel files into Python specifically pandas with Work or School Account?


The question is very similar to the link below. How to read SharePoint Online (Office365) Excel files in Python with Work or School Account?

Essentially I would like to import an excel file off SharePoint into pandas for further analysis.

The issue is when I run the code below I get the following error.

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n<!DOCT'

My code:

from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.file import File 

url = 'https://companyname.sharepoint.com/SitePages/Home.aspx'
username = '[email protected]'
password = 'password!'
relative_url = '/Shared%20Documents/Folder%20Number1/Folder%20Number2/Folder3/Folder%20Number%Four/Target_Excel_File_v4.xlsx?d=w8f97c2341898_random_numbers_and_letters_a065c12cbcsf=1&e=KXoU4s'


ctx_auth = AuthenticationContext(url)
if ctx_auth.acquire_token_for_user(username, password):
  ctx = ClientContext(url, ctx_auth)
  web = ctx.web
  ctx.load(web)
  ctx.execute_query()
  #this gives me a KeyError: 'Title'
  #print("Web title: {0}".format(web.properties['Title']))
  print('Authentication Successful')
else:
  print(ctx_auth.get_last_error())


import io
import pandas as pd

response = File.open_binary(ctx, relative_url)

#save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start

#read file into pandas dataframe
df = pd.read_excel(bytes_file_obj)

print(df)

Solution

  • For those of you that ended up like me here at this issue, I found that one has to path the full URL to File, not just the path:

    #import all the libraries
    from office365.runtime.auth.authentication_context import AuthenticationContext
    from office365.sharepoint.client_context import ClientContext
    from office365.sharepoint.files.file import File 
    import io
    import pandas as pd
    
    #target url taken from sharepoint and credentials
    url = 'https://company.sharepoint.com/Shared%20Documents/Folder%20Number1/Folder%20Number2/Folder3/Folder%20Number4/Target_Excel_File_v4.xlsx?cid=_Random_letters_and_numbers-21dbf74c'
    username = '[email protected]'
    password = 'Password!'
    
    ctx_auth = AuthenticationContext(url)
    if ctx_auth.acquire_token_for_user(username, password):
      ctx = ClientContext(url, ctx_auth)
      web = ctx.web
      ctx.load(web)
      ctx.execute_query()
      print("Authentication successful")
    
    response = File.open_binary(ctx, url)
    
    #save data to BytesIO stream
    bytes_file_obj = io.BytesIO()
    bytes_file_obj.write(response.content)
    bytes_file_obj.seek(0) #set file object to start
    
    #read excel file and each sheet into pandas dataframe 
    df = pd.read_excel(bytes_file_obj, sheetname = None)