Search code examples
pythongoogle-cloud-platformgoogle-cloud-storagegoogle-iam

Inconsistent access to subfolder of a bucket between gsutil and storage Client


As to avoid managing a large number of buckets for data received from a lot of devices, I plan to have them write the files they capture in folders of a single bucket instead of having one bucket for each device.

As to make sure each device can only write in its subfolder, I have set the IAM condition as described in this answer:

resource.name.startsWith('projects/_/buckets/dev_bucket/objects/test_folder')

My service account now has the Storage Object Creator and Storage Object viewer role with the condition above attached.

This is the (truncated only to this service account) output of the gcloud get-iam-policy <project> command

- condition:
    expression: |-
      resource.name.startsWith("projects/_/buckets/dev_bucket/objects/test_folder/")
    title: only_test_subfolder
  members:
  - serviceAccount:myserviceaccount.iam.gserviceaccount.com
  role: roles/storage.objectCreator
- condition:
    expression: |-
      resource.name.startsWith("projects/_/buckets/dev_bucket/objects/test_folder/")
    title: only_test_subfolder
  members:
  - serviceAccount:myserviceaccount.iam.gserviceaccount.com
  role: roles/storage.objectViewer

When using the gsutil command, everything seems to work fine

# Set the authentication via the service account json key
gcloud auth activate-service-account --key-file=/path/to/my/key.json

# all of these commands work fine
gcloud ls gs://dev_bucket/test_folder 
gcloud cp gs://dev_bucket/test_folder/distant_file.txt local_file.txt

# These ones get a 403 as expected
gcloud ls gs://dev_bucket/
gcloud ls gs://another_bucket
gcloud_cp gs://dev_bucket/another_subfolder/somefile.txt local_file.txt

However, when I am trying to use the google storage client (v 2.1.0) I cannot manage to make it work, mainly because I am supposed to define the bucket before getting an object in this bucket.

import os 
from google.cloud import storage
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="path/to/my/key.json"

client = storage.Client()

client.get_bucket("dev_bucket")

>>> Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/dev_bucket?projection=noAcl&prettyPrint=false: <Service account> does not have storage.buckets.get access to the Google Cloud Storage bucket.

I have also tried to list all files using the prefix argument, but get the same error:

client.list_blobs("dev_bucket", prefix="test_folder")

Is there a way to use the python storage client with this type of permissions ?


Solution

  • This is an expected behavior!

    You are doing:

    gsutil ls gs://dev_bucket/test_folder 
    gsutil cp gs://dev_bucket/test_folder/distant_file.txt local_file.txt
    

    Both commands does not require any additional permissions other than storage.objects.get which your SA has it from the role Storage Object viewer

    but in your code you are trying to access the bucket details (the bucket itself, not objects inside the bucket) so it won't work unless your SA have the permission storage.buckets.get

    this line:

    client.get_bucket("dev_bucket")
    

    will perform a GET method on v1/buckets/get which requires the above mentioned IAM permission.

    So, you need to modify your code to read objects only without accessing bucket details.

    Here is a sample code for downloading objects from a bucket.

    note: the method bucket(bucket_name, user_project=None) which is used in this sample code will not perform any HTTP requests as quoted from docs.

    This will not make an HTTP request; it simply instantiates a bucket object owned by this client.


    BTW, You can try to run something like:

    gsutil ls -L -b gs://dev_bucket
    

    I expect this command to give you the same error which you get from your code.


    References:

    https://cloud.google.com/storage/docs/access-control/iam-gsutil https://cloud.google.com/storage/docs/access-control/iam-json