I am trying to find the most cost effective way of doing this, will appreciate any help:
My question is when I upload 100s of million files, does this count as one PUT request per file (meaning one per object)? If so, just the cost to upload the data will be massive. If I upload a directory with a million files, is that one PUT request?
What if I zip the 100 million files on prem, then upload the zip, and use lambda to unzip. Would that count as one PUT request?
Any advise?
You say that you have "100s of millions of files", so I shall assume you have 400 million objects, making 40TB of storage. Please adjust accordingly. I have shown my calculations so that people can help identify my errors.
Initial upload
PUT requests in Amazon S3 are charged at $0.005 per 1,000 requests
. Therefore, 400 million PUTs would cost $2000. (.005*400m/1000
)
This cost cannot be avoided if you wish to create them all as individual objects.
Future uploads would be the same cost at $5 per million.
Storage
Standard storage costs $0.023 per GB
, so storing 400 million 100KB objects would cost $920/month. (.023*400m*100/1m
)
Storage costs can be reduced by using lower-cost Storage Classes.
Access
GET requests are $0.0004 per 1,000 requests
, so downloading 1 million objects each month would cost 40c/month. (.0004*1m/1000
)
If the data is being transferred to the Internet, Data Transfer costs of $0.09 per GB
would apply. The Data Transfer cost of downloading 1 million 100KB objects would be $9/month. (.09*1m*100/1m
)
Analysis
You seem to be most fearful of the initial cost of uploading 100s of millions of objects at a cost of $5 per million objects.
However, storage will also be high, and the cost of $2.30/month per million objects ($920/month for 400m objects). That ongoing cost is likely to dwarf the cost of initial uploads.
Some alternatives would be: