Search code examples
pythonpython-3.xamazon-web-servicesamazon-s3boto3

Download Entire Content of a subfolder in a S3 bucket


I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".

Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?

For example --> sample-data/a/foo.txt,more_files/foo1.txt

In the above example the bucket sample-data contains an folder called a which contains foo.txt and a folder called more_files which contains foo1.txt

I know how to download a single file. For instance if i wanted foo.txt I would do the following.

    s3 = boto3.client('s3')
    s3.download_file("sample-data", "a/foo.txt", "foo.txt")

However i am wondering if i can download the folder called a and all it's contents entirely? Any help would be appreciated.


Solution

  • I think your best bet would be the awscli

    aws s3 cp --recursive s3://mybucket/your_folder_named_a path/to/your/destination
    

    From the docs:

    --recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.

    EDIT:

    However, to do this with boto3 try this:

    import os
    import errno
    import boto3
    
    client = boto3.client('s3')
    
    
    def assert_dir_exists(path):
        try:
            os.makedirs(path)
        except OSError as e:
            if e.errno != errno.EEXIST:
                raise
    
    
    def download_dir(bucket, path, target):
        # Handle missing / at end of prefix
        if not path.endswith('/'):
            path += '/'
    
        paginator = client.get_paginator('list_objects_v2')
        for result in paginator.paginate(Bucket=bucket, Prefix=path):
            # Download each file individually
            for key in result['Contents']:
                # Calculate relative path
                rel_path = key['Key'][len(path):]
                # Skip paths ending in /
                if not key['Key'].endswith('/'):
                    local_file_path = os.path.join(target, rel_path)
                    # Make sure directories exist
                    local_file_dir = os.path.dirname(local_file_path)
                    assert_dir_exists(local_file_dir)
                    client.download_file(bucket, key['Key'], local_file_path)
    
    
    download_dir('your_bucket', 'your_folder', 'destination')