Search code examples
pythonamazon-s3ibm-cloudobject-storage

Listing all keys in a IBM COS S3 bucket using Python


I am looking to keep track of the list of keys that are stored in a cos bucket.

I am using Python and currently my code is:

 files = cos.Bucket('bucketname').objects.all()

 for file in files:
     data[file.key] = 'not processed'
     data_array.append(data)

This is extremely slow for me as there are quite a lot of 1M+ keys in my bucket at the moment.

Is there a better way? I'm currently looking at https://alexwlchan.net/2018/01/listing-s3-keys-redux/

But I am having trouble since the ibm-cos-sdk returns an s3 resource and not a client when establishing the connection.

Any tips would be appreciated.


Solution

  • You can use code similar to the one you linked to. I was able to execute it with code like this:

    # fetch endpoints
    endpoints_list_uri="https://cos-service.bluemix.net/endpoints"
    endpoints = requests.get(endpoints_list_uri).json()
    cos_host = (endpoints['service-endpoints']['regional']['us-south']['public']['us-south'])
    
    #create client
    cos = ibm_boto3.client('s3',endpoint_url='https://'+cos_host)
    
    # retrieve keys from bucket
    keys=get_matching_s3_keys(s3=cos,bucket="encryptedbucket1",prefix='mypattern')
    for key in keys:
        print key
    

    Note that I have modified both function headers to allow passing in the S3 client object. If there is interest, I could put the entire source code on GitHub etc.