Search code examples
google-cloud-platformgoogle-drive-api

Querying Google Drive files by label is failing


I need some help as I'm smashing my head on a wall.

I need to write a script to run periodically on lambda that will pull values from some sheets in google drive. The most straightforward way of finding these is to use the gdrive labels feature. We've enabled it, created the label, and tagged some files.

I can then use the api explorer to query for all files with that label using this query 'labels/LYBX-my-label-id-bFcb' in labels

I can also grab what my browser sent out and run it locally in postman or node/whatever. It works and returns the expected file listings.

However that is using my personal account credentials and when doing this "for real" we need to use a service account of course. So we created a GCP project with a service account, and I'm using the googleapiclient python package. I store the secret for that service account in aws secretmanager, fetch it, and configure my instance of the drive resource with it.

This all works. I can use it to call drive.files().get(...) and drive.files().list(...) and fetch data on files using all sorts of queries except the one I use above for the label. When I do that query I get back a 400 error that complains about the q (query) parameter.

Now I've dropped down to the level of the url itself, and the exact GET request url that my python script logs works when I use my personal bearer token. I'm pretty sure therefore that this is not in fact a bad parameter issue and that's instead just a case of google being godawful at api design and returning crappy error codes.

So I'm thinking that this has to be a permission issue, but I have no clue what permissions are required to allow an account to search by gdrive labels nor how I would go about granting those permissions to a service account.

Another possible clue is that drive.files().listLabels(fileId="...") on a file that I know has labels seems to fail, so again all points to some sort of permission being missing but its unclear which nor how to set those up on service accounts.


Solution

  • SUGGESTION

    Note: Since I do not have visibility of your actual script, you can consider this answer as a starting point or reference for fixing the issue in your project. Hopefully, this will resolve your problem.

    I conducted my own replication and successfully listed files by using a query based on the label ID with a service account through the process of user impersonation. This should be added in the credential creation phase, where you include a subject parameter to enable the service account to impersonate a user (such as a super admin account or any domain account with the necessary role) for service account delegation.

    Test Script

    from google.oauth2 import service_account
    from googleapiclient.discovery import build
    
    # Path to the service account JSON key file
    KEY_FILE = 'sa.json'
    
    # Create credentials from the service account key file & Build the service object
    credentials = service_account.Credentials.from_service_account_file(
        KEY_FILE, scopes=['https://www.googleapis.com/auth/drive',
                          'https://www.googleapis.com/auth/drive.file',
                          'https://www.googleapis.com/auth/drive.metadata',
                          'https://www.googleapis.com/auth/drive.metadata.readonly',
                          'https://www.googleapis.com/auth/drive.readonly'],
                          subject="irv@■■■■■■■■■■■■■■.■■■■");
    
    service = build('drive', 'v3', credentials=credentials);
    
    # List files under a label
    label_id = "OTVglmjg5BxgxSevMiuLtr6VoaeDwyg66AIRNNEbbFcb";
    results = service.files().list(q= f"'labels/{label_id}' in labels").execute()
    
    results
    

    Demo

    I have created a test label and tagged it with two files in my drive:

    enter image description here

    After running the test script:

    enter image description here

    Reference