Search code examples
amazon-web-servicesbackupamazon-glacier

Backup: Amazon S3 or Glacier - lots of little files?


I'm trying to understand the complicated Amazon Glacier pricing model. I don't want to store a huge amount of data, a few GB's say 10. I hope never to download the files and if I did need to I don't care how long it takes.

Is there a cost per file I upload? Is it cheaper to zip lots of tiny files and upload in a few chunks or does 10,000 say images not matter? (cannot get a straight answer to this during searching)

Am I able to request the download of a whole Archive/Bucket or is it file-by-file?


Solution

  • Detailed pricing information for S3 is available here. Specifics of the API functions available are here.

    For S3, you are mostly charged for upload bandwidth (bytes sent TO S3), download bandwidth (bytes received FROM S3), and storage (bytes IN S3). You are also charged for the number and type of API calls.

    So, if you upload your 10GB of data to S3 in 10,000 1MB files, store it for a month, and then download each of the files once, you'll be charged:

    • $0.00 for upload bandwidth (this is free)
    • $0.10 for the 10,000 PUT requests to upload the files
    • $0.95 for storing the 10GB for a month
    • $1.08 for 10GB download bandwidth (the first is free, then $0.12/GB)
    • $0.01 for the 10,000 GET requests to download the files

    That's $2.14. If you uploaded and downloaded once each, but kept the data for a year, only the storage cost would go up to 12 * $0.95, or $11.40. If your files averaged only 100KB, so you had 100,000 of them, you'd pay 10 times as much for the PUT and GET requests, or $1.10 instead of $0.11.

    You can only upload and download a single file per operation. If you combined your files into one using Zip, you'd only save by using fewer operations, which, as you can see, are pretty cheap to start with.

    There is one quirk here, though. I'm pretty sure you are charged for all bandwidth usage when uploading and downloading, including request headers, not just the bodies containing your data. So if your files were really tiny the request headers might become significant, perhaps as much as the files themselves. In that case your bandwidth costs would double.

    Glacier pricing is more complicated, and I've never used it myself. Basically, it reduces storage cost by almost ten-fold, leaving the other costs the same, and adding costs to archive and restore per object. Those costs seem to be significant if you have a lot of small objects, need to get a lot of your files at a time, or get files frequently. Glacier seems to be best when you have a lot of data (terabytes or more, not just gigabytes), but few operations. Given that you only have 10GB of data, S3 is so inexpensive it doesn't seem worth it to consider Glacier.

    Finally, AWS has a free usage tier for the first year, which looks like it would cover all your costs except for half the storage charges.