Search code examples
amazon-athenaamazon-kinesis-firehose

Athena gzip compression query result has hybrid compressed-decompressed


I'm setting AWS Athena with s3 bucket which has gzipped csv files.

And then query like this

SELECT * FROM "sample_db"."sample_table2" limit 100;

results is different take 1 and 2.

it seems like to mix compression / decompression results.

Is there any way getting result only decompressed result on Athena ?

file contents is below:

"title","user_info.client_user_id","user_info.player_id"
"test : csv take 4",,
"title","user_info.client_user_id","user_info.player_id"
"test : csv take 4",,
"title","user_info.client_user_id","user_info.player_id"
"test : csv take 4",,
"title","user_info.client_user_id","user_info.player_id"
"test : csv take 4",,

s3 has only one file test-sample.gz

Query Take 1 enter image description here

Query Take 2 enter image description here


Solution

  • Cause is wrong format query, partitioning for csv and corrupted data.

    It is working on directly s3 gz upload in directories.