Search code examples
amazon-web-servicesshellamazon-s3aws-cli

run a large number of S3 copy command


I have a requirement where we'll be running a large no of the copy command on S3 like

aws s3 cp s3://ABC/pqr/1/1157181.json s3://ABC/xyz/1/1157181.json
aws s3 cp s3://ABC/pqr/7/1157182.json s3://ABC/xyz/7/1157182.json
aws s3 cp s3://ABC/pqr/13/1157183.json s3://ABC/xyz/13/1157183.json
aws s3 cp s3://ABC/pqr/14/1157181.json s3://ABC/xyz/14/1157181.json
aws s3 cp s3://ABC/pqr/29/1157182.json s3://ABC/xyz/29/1157182.json
aws s3 cp s3://ABC/pqr/33/1157183.json s3://ABC/xyz/33/1157183.json
.
.
.
aws s3 cp s3://ABC/pqr/n/1157277.json s3://ABC/xyz/n/1157277.json

There are a few million of these commands. I can run them as a shell script file.

However it's very slow, I was wondering if there is a better way to run this.


Solution

  • After not finding any solution for my specific scenario, I did the following to execute millions of AWS S3 cli commands

    1. Generate an EC2 instance with s3 permissions.

    2. Copy the file with all my commands to the EC2 instance.

    3. Split the command file either manually or using split command

      split -l 200000 commands.sh -d -a 1 commands.sh

    4. Generate a new shell script file that runs all the split command files, like

      sh command.sh0 > command0.log & sh command.sh1 > command1.log &

    5. additionally you can add individual logs by using '>' and by using '&' you can run the command in the background.

    6. finally using 'ps aux' we can monitor the status of execution.