Search code examples
pythonexcelgoogle-sheetsgoogle-sheets-apigspread

How to download a Google Docs excel sheet with a Gspread and access data locally (A1 notation)?


I need to download an excel sheet from Google Docs via Gspread and then multiple times I'll need to read the values of different cells in 'A1' notation. Thus, I can't just get the spreadsheet and then call val = worksheet.acell('B1').value, because the script will freeze out of too many API calls. My solution for now:

def download_hd_sheet():
    worksheet = gc.values().get(spreadsheetId=excel_id, range='variables', valueRenderOption='FORMULA').execute()['values']
    df = pd.DataFrame(worksheet)
    writer = pd.ExcelWriter("Variables.xlsx", engine='xlsxwriter')
    df.to_excel(writer, sheet_name='Sheet1', index=False, header=False)
    workbook = writer.book
    worksheet = writer.sheets['Sheet1']
    writer.save()
    book = openpyxl.load_workbook('Variables.xlsx', data_only=False)
    global hd_sheet
    hd_sheet = book.active

So far what I'm doing is:

  1. I download the values from the worksheet.
  2. Transform it (list of lists) into a pandas dataframe.
  3. Then I write the df to a .xlsx file.
  4. I read the .xlsx file to a global variable

It seems to me that I am doing so many things just to achieve something that can be done in two lines. Please, let me know what would be more effective than the above.


Solution

  • I believe your goal as follows.

    • You want to download the Google Spreadsheet as the XLSX data.
    • You want to use the downloaded XLSX data without saving as the file.
    • You have already been able to get and put values for Google Spreadsheet using gspread.
    • You want to achieve this using python.

    In order to achieve your goal, I would like to propose the following flow.

    1. Download the Google Spreadsheet as the XLSX data using the method of Files: export in Drive API.
    2. Open the XLSX data using the downloaded binary data with openpyxl.load_workbook().

    Sample script:

    In this sample script, from your situation, the access token is used from the authorization for gspread.

    spreadsheetId = "###"  # Please set the Spreadsheet ID.
    
    client = gspread.authorize(credentials)
    access_token = client.auth.token
    url = "https://www.googleapis.com/drive/v3/files/" + spreadsheetId + "/export?mimeType=application%2Fvnd.openxmlformats-officedocument.spreadsheetml.sheet"
    res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
    book = openpyxl.load_workbook(filename=BytesIO(res.content), data_only=False)
    hd_sheet = book.active
    
    • By above script, the XLSX data is directly downloaded from Google Spreadsheet and openpyxl.load_workbook

    • In this case, the following libraries in addition to gspread are used.

        import openpyxl
        import requests
        from io import BytesIO
      

    Note:

    • In this case, please include the scope of https://www.googleapis.com/auth/drive or https://www.googleapis.com/auth/drive.readonly. When you modified the scopes, please reauthorize the scopes. By this, the new scopes are reflected to the access token. So please be careful this.

    References: