I have about 80,000,000 50KB files on S3 (4TB), which I want to transfer to Glacier DA. I have come to realize there's a cost inefficiency in transferring a lot of small files to Glacier.
Assuming I don't mind archiving my files into a single (or multiple) tar/zips - what would be the best practice to transition those files to Glacier DA?
It is important to note that I only have these files on S3, and not on any local machine.
The most efficient way would be:
--storage-class DEEP_ARCHIVE
The above would incur very little charge since you can terminate the EC2 when it is no longer needed, and EBS is only charged while the volumes exist.
If it takes too long to list a subset of the files, you might consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You can then use this list to specifically copy files, or identify a path/subdirectory to copy.
As an extra piece of advice... if your system is continuing to collect even more files, you might consider collecting the data in a different way (eg streaming to Kinesis Firehose to batch data together), or combining the data on a regular basis rather than letting it creep up to so many files again. Fewer, larger files are much easier to use in processes if possible.