Search code examples
amazon-web-servicesamazon-s3aws-cliamazon-cloudwatchlogsamazon-kinesis-firehose

Different S3 Download Behavior Between the Console and CLI


I have setup a cloudwatch log group to stream logs via kinesis & firehose into an s3 bucket, as gzip-ed files.

The gzip files are all tagged with some Metadata:

Content-Encoding     gzip
Content-Type         application/octet-stream

When I download one of the files directly from the browser console and unzip it I get the expected contents of a log file, namely json strings. However, if I use the aws CLI to cp the file locally and unzip the contents then the file renders as binary at the terminal.

What could be the cause in the difference in behavior between the AWS Console download button and the AWS CLI s3 cp command?

I have already tried to specify various combinations of command line flags

aws s3 cp --content-encoding gzip --content-type "application/json"
aws s3 cp --content-encoding gzip --content-type "application/octet-stream"
aws s3 cp --content-encoding gzip --content-type "application/octet-stream" --sse-kms-key-id <keyArn>

But none of them have produced the positive result I get from using the Console in the browser.

UPDATE

The s3 cli version of the file is almost 10KB larger than the management console version.


Solution

  • The firehose was setup to compress the contents of the message. However, cloudwatch was already compressing the message as well.

    When the browser downloads a file from S3 it automatically decompressed the first, of two, layers of compression. Therefore, a second decompression resulted in the expected logs.

    The CLI does not execute this automatic decompression. So, decompressing the file still resulted in a compressed, binary, file. A second decompression solved the problem.