Search code examples
pythonpandasblobgoogle-cloud-storage

Getting the sub folder of a gcs bucket


I need help with getting the folder names in a gcs bucket (i.e. emaildownloads/20230130/abc/xyz). The 'emaildownloads' is the bucket name, and i need to extract the date and the 'abc' folder.

    def my_list_bucket(self, bucketName,  delimiter='/'):
    storage_client = self. storage_client.lookup_bucket(bucketName)
    blobs = storage_client.list_blobs(prefix='20230130', delimiter='/')

    print("Blobs:")
    for blob in blobs:
        print(blob.name)

    if delimiter:
        print("Prefixes:")
        for prefix in blobs.prefixes:
            print(prefix)

I was able to pull the just folder name, but I need it to be dynamic and I don't want to hard code the date in. I only need the substring between the bucket and the final forward slashes. 20230130/abc/ is all I need from the pathname.


Solution

  • If it is returning the full path, and you only need part of that path, you can just split the full path at the / character and get the parts you want.

    def my_list_bucket(self, bucketName, limit=sys.maxsize):
        a_bucket = self.storage_client.lookup_bucket(bucketName)
        bucket_iterator = a_bucket.list_blobs()
        for resource in bucket_iterator:
            path_parts = resource.name.split('/')
            date_folder = path_parts[1]
            abc_folder = path_parts[2]
            desired_path = path_parts[1] + '/' + path_parts[2] +'/'
            limit = limit - 1
            if limit <= 0:
                break
    

    `