Search code examples
web-servicesamazon-web-servicesamazon-s3cloudblob

Is it possible to perform a batch upload to amazon s3?


Does amazon s3 support batch uploads? I have a job that needs to upload each night ~100K of files that can be up to 1G but is strongly skewed towards small files (90% are less than 100 bytes and 99% are less than 1000 bytes long).

Does the s3 API support uploading multiple objects in a single HTTP call?

All the objects must be available in S3 as individual objects. I cannot host them anywhere else (FTP, etc) or in another format (Database, EC2 local drive, etc). That is an external requirement that I cannot change.


Solution

  • Does the s3 API support uploading multiple objects in a single HTTP call?

    No, the S3 PUT operation only supports uploading one object per HTTP request.

    You could install S3 Tools on your machine that you want to synchronize with the remote bucket, and run the following command:

    s3cmd sync localdirectory s3://bucket/
    

    Then you could place this command in a script and create a scheduled job to run this command each night.

    This should do what you want.

    The tool performs the file synchronization based on MD5 hashes and filesize, so collision should be rare (if you really want you could just use the "s3cmd put" command to force blind overwriting of objects in your target bucket).

    EDIT: Also make sure that you read the documentation on the site I linked for S3 Tools - there are different flags needed for whether you want files deleted locally to be deleted from the bucket or ignored etc.