Search code examples
amazon-web-servicesamazon-s3aws-glue

Get files From SFTP to S3 AWS


What is the better option of get data from a directory in SFTP and copy in bucket of S3 of AWS? In SFTP i only have permission of read so Rsync isn't option.

My idea is create a job in GLUE with Python that download this data y copy in bucket of S3. They are different files, one weighs about 600 MB, others are 4 GB.


Solution

  • Assuming you are talking about an sFTP server that is not on AWS, you have a few different options that may be easier than what you have proposed (although your solution could work):

    1. Download the AWS CLI onto the sFTP server and copy the files via the AWS s3 cp command.
    2. Write a script using the AWS SDK that takes the files and copies them. You may need to use the multi-part upload with the size of your files.
    3. Your can create an AWS managed sFTP server that links directly to your s3 bucket as the backend storage for that server, then use sftp commands to copy the files over.

    Be mindful that you will need the appropriate permissions in your AWS account to complete any of these 3 (or 4) solutions.