Search code examples
pythonamazon-web-servicesamazon-s3boto

list S3 objects till only first level


I am trying to list s3 obejcts like this:

for key in s3_client.list_objects(Bucket='bucketname')['Contents']:
    logger.debug(key['Key'])

I just want to print the folder names or file names that are present on the first layer.

For example, if my bucket has this:

bucketname
     folder1
     folder2
        text1.txt
        text2.txt
    catallog.json

I only want to print folder1, folder2 and catalog.json. I don't want to include text1.txt etc.

However, my current solution also prints the files names present within the folders in my bucketname.

How can I modify this? I saw that there's a 'Prefix' parameter but not sure how to use it.


Solution

  • You can split the keys on "/" and only keep the first level:

    level1 = set()  #Using a set removes duplicates automatically 
    for key in s3_client.list_objects(Bucket='bucketname')['Contents']:
            level1.add(key["Key"].split("/")[0])  #Here we only keep the first level of the key 
    
    #then print your level1 set
    logger.debug(level1)
    

    /!\ Warnings

    1. list_object method has been revised and it is recommended to use list_objects_v2 according to AWS S3 documentation
    2. this method only returns some or all (up to 1,000) keys. If you want to make sure you get all the keys, you need to use the continuation_token returned by the function:
    level1 = set()
    continuation_token = ""
    while continuation_token is not None:
        extra_params = {"ContinuationToken": continuation_token} if continuation_token else {}
        response = s3_client.list_objects_v2(Bucket="bucketname", Prefix="", **extra_params)
        continuation_token = response.get("NextContinuationToken")
        for obj in response.get("Contents", []):
            level1.add(obj.get("Key").split("/")[0])
    
    logger.debug(level1)