Search code examples
pythongoogle-drive-apigoogle-sheets-apigoogle-api-python-client

Troubleshooting Google Drive API Issue: Google cloud python function Fails to Detect PDF Files in Shared Folder


I developed a Python script that operates on the Google Cloud Platform. The script utilizes the Google Drive API and Google Sheet API to access a folder in Google Drive belong to a company , extract data from PDF files within that folder, and then transfer the extracted data to a Google Sheet.

To ensure proper functionality, I set up a service account and configured the necessary APIs. Additionally, I integrated a secret manager to link the function with Google Drive and Google Sheet.

I granted access to the drive folders by sharing them with the service account's email ID.

However, upon running the script, the Drive API failed to detect the PDF files within the shared folders. Surprisingly, the APIs did not return any error messages.

def list_files_in_folder(drive,folder_id):
  #print(folder_id)
  # List files in the specified folder
  query = f"parents = '{folder_id}'"
  files = []
  response = drive.files().list(q = query).execute()
  #print(f'response:{response}')
  files = response.get('files')
  #print(f'First page files: {files}')
  next_page_token = response.get('nextPageToken')

  while next_page_token:
    response = drive.files().list(q=query,nextPageToken=next_page_token).execute()
    files.extend(response.get('files'))
    next_page_token = response.get('nextPageToken')

  return files

In an attempt to troubleshoot the issue, I tested the script using an alternate Google Drive account, distinct from the company's original drive primary drive, which is accessed by multiple accounts. When I created a folder containing PDF files and shared it with the same service email ID, the script successfully accessed the folder contents without any errors.


Solution

  • From I'm using a shared drive, in this case, your script is required to be modified. So, please modify it as follows and test it again.

    Modified script:

    def list_files_in_folder(drive, folder_id):
        # print(folder_id)
        # List files in the specified folder
        query = f"'{folder_id}' in parents and trashed=false"
        files = []
        response = drive.files().list(
            q=query,
            pageSize=1000,
            supportsAllDrives=True,
            includeItemsFromAllDrives=True,
            corpora="allDrives"
        ).execute()
        # print(f'response:{response}')
        files = response.get('files')
        # print(f'First page files: {files}')
        next_page_token = response.get('nextPageToken')
    
        while next_page_token:
            response = drive.files().list(
                q=query,
                pageSize=1000,
                nextPageToken=next_page_token,
                supportsAllDrives=True,
                includeItemsFromAllDrives=True,
                corpora="allDrives"
            ).execute()
            files.extend(response.get('files'))
            next_page_token = response.get('nextPageToken')
    
        return files
    
    • When this modified script was run, I confirmed that the file list could be retrieved from the shared drive.

    Note:

    • About your search query of query = f"parents = '{folder_id}'", in order to retrieve the file list under the specific folder, the official document says '1234567' in parents. Ref But, in my report, in the current stage, '###folderId###' in parents, parents = '###folderId###', and parents in '###folderId###' can be used for retrieving the file list from the folder. In this modification, I used '###folderId###' in parents of the official document.
    • Also, in order to avoid retrieving the files in the trash box, I added trashed=false.

    Reference: