Search code examples
amazon-web-servicesamazon-s3dropboxamazon-drive

Sync a specific set of files from Amazon S3 to Dropbox or Amazon Drive


I have an Amazon S3 bucket with tons of images. A subset of these images need to be synced to a local machine for image analysis (AI) purposes. This has to be done regularly and ideally with a list of file names as input. Not all images need to be synced.

There are ways to synchronise S3 with either Dropbox/Amazon Drive or other storage services, but none of them appear to have the option to provide a list of files that need to be synced.

How can this be implemented?


Solution

  • The first thing that springs to mind when talking about syncing and s3 is using the aws s3 sync cli command. This will allow you to sync specific origin destination folders as well as afford you the ability to use --include, --exclude if you want to list specific files. The commands also allow for the use of wildcards [*] if you have specific naming conventions you can use to identify the files.

    You can also repeatedly call the --exclude command for multiple files, so depending on your OS you could either list all files or create a find script that identifies the files and singles them out.

    Additionally you are able to do --delete which will remove any files in the destination path that are not in the origin.