Search code examples
google-cloud-platformgoogle-cloud-storagegoogle-python-api

Is it possible to query Google Cloud Storage similar to using `ls` command in terminal?


I am using the python library for querying Google Cloud Storage, and I am organizing information in Storage using a naming hierarchy. For example:

my_bucket/simulations/version_1/data...
my_bucket/simulations/version_2/data...
my_bucket/simulations/version_3/data...
my_bucket/other_data/more_data...

My question is: is it possible to query using list_blobs or some other method to retrieve a list that contains just the versions from the "simulations" directory, and not all of the blobs below simulations?

For reference, this returns all blobs in a paginated fashion:

cursor = bucket.list_blobs(prefix='simulations')

Solution

  • I've played around with the prefix and delimiter parameters of list_blobs method and this code worked:

    from google.cloud import storage
    
    def ls(bucket_name, prefix, delimiter):
    
        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
    
        cursor = bucket.list_blobs(prefix=prefix, delimiter=delimiter)
        for blob in cursor:
            pass
    
        for prefix in cursor.prefixes:
            print prefix
    
    ls(your_bucket_name, 'simulations/', '/')
    

    output:

    simulations/version-1/
    simulations/version-2/
    simulations/version-3/
    

    Note that this will only display the names of the directories inside the simulations/ directory, the files will be omitted.