Search code examples
pythondirectorygoogle-drive-apigspread

gspread list_spreadsheet_files not getting files in subfolders given a folder_id


This used to work in getting all Google Sheets in all sub-folders given a top folder ID:

import gspread
scope = ['https://spreadsheets.google.com/feeds',
          'https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name(
          'creds_path/creds.json', scope)
gc = gspread.authorize(credentials)
folderID = 'a_folder_id_here'
existing_sheets = gc.list_spreadsheet_files(folder_id = folderID)

This now returns an empty list where before it returned all of the Google Sheet names in that folder. The folder in question only has sub-folders, which each have Google Sheets inside of them.

Using:

gspread 5.7.2
Python 3.10

Solution

  • Modification points:

    • When I saw the script of gspread, it seems that the method of list_spreadsheet_files() uses a search query of q += ' and parents in "{}"'.format(folder_id). Ref In this case, the files just under the folder are retrieved. I think that this is the reason for your current issue.
    • In order to achieve your current situation of The folder in question only has sub-folders, which each has Google Sheets inside of them., it is required to retrieve the file list under the subfolders.

    In this answer, a python library of getfilelistpy is used. Ref When this library is used, the file list can be retrieved from the subfolders.

    When this is reflected in your script, how about the following modification?

    Modified script:

    First, please install the library as follows.

    $ pip install getfilelistpy
    

    And, please modify your script as follows.

    From:

    gc = gspread.authorize(credentials)
    folderID = 'a_folder_id_here'
    existing_sheets = gc.list_spreadsheet_files(folder_id = folderID)
    

    To:

    from getfilelistpy import getfilelist
    
    gc = gspread.authorize(credentials)
    folderID = 'a_folder_id_here'
    resource = {
        "oauth2": gc.auth,
        "id": folderID,
        "fields": "files(name,id,createdTime,modifiedTime,mimeType)",
    }
    res = getfilelist.GetFileList(resource)
    existing_sheets = sum([e.get("files", []) for e in res.get("fileList", [])], [])
    
    • When this modified script is run, the file list is retrieved from the folder of folderID including all subfolders.

    • In this modification, when the files except for Google Spreadsheet are existing in the subfolders, those are also retrieved. If you want to restrict for retrieving only Spreadsheets, please modify the last line of the script as follows.

        existing_sheets = sum([[f for f in e.get("files", []) if f["mimeType"] == "application/vnd.google-apps.spreadsheet"] for e in res.get("fileList", [])], [])
      

    Reference: