python python-3.x amazon-web-services amazon-s3 boto3

Download Entire Content of a subfolder in a S3 bucket

I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".

Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?

For example --> sample-data/a/foo.txt,more_files/foo1.txt

In the above example the bucket sample-data contains an folder called a which contains foo.txt and a folder called more_files which contains foo1.txt

I know how to download a single file. For instance if i wanted foo.txt I would do the following.

    s3 = boto3.client('s3')
    s3.download_file("sample-data", "a/foo.txt", "foo.txt")

However i am wondering if i can download the folder called a and all it's contents entirely? Any help would be appreciated.

Solution

I think your best bet would be the awscli

aws s3 cp --recursive s3://mybucket/your_folder_named_a path/to/your/destination

From the docs:

--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.

EDIT:

However, to do this with boto3 try this:

import os
import errno
import boto3

client = boto3.client('s3')


def assert_dir_exists(path):
    try:
        os.makedirs(path)
    except OSError as e:
        if e.errno != errno.EEXIST:
            raise


def download_dir(bucket, path, target):
    # Handle missing / at end of prefix
    if not path.endswith('/'):
        path += '/'

    paginator = client.get_paginator('list_objects_v2')
    for result in paginator.paginate(Bucket=bucket, Prefix=path):
        # Download each file individually
        for key in result['Contents']:
            # Calculate relative path
            rel_path = key['Key'][len(path):]
            # Skip paths ending in /
            if not key['Key'].endswith('/'):
                local_file_path = os.path.join(target, rel_path)
                # Make sure directories exist
                local_file_dir = os.path.dirname(local_file_path)
                assert_dir_exists(local_file_dir)
                client.download_file(bucket, key['Key'], local_file_path)


download_dir('your_bucket', 'your_folder', 'destination')