Search code examples
amazon-s3compressionhivefile-transferemr

Compress file on S3


I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed.

I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the bottleneck (250kB/s).

I've not found any straightforward way to compress the file on S3, or enable compression on transfer in s3cmd, boto, or related tools.


Solution

  • S3 does not support stream compression nor is it possible to compress the uploaded file remotely.

    If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.

    http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html

    If you need this more frequently

    Serving gzipped CSS and JavaScript from Amazon CloudFront via S3