Search code examples
amazon-web-servicesamazon-athena

Is it possible to unload data in AWS Athena to a single file?


The doc states that

UNLOAD results are written to multiple files in parallel.

I guess this is more efficient for both read and write, so unloading to a single file doesn't make sense. But, if for some reason the end user wants the output as a single file, is it possible?


Solution

  • Running a SELECT query in Athena produces a single result file in Amazon S3 in uncompressed CSV format this is the default behaviour.

    If your query is expected to output a large result set then significant time is spent in writing results as one single file to Amazon S3. With UNLOAD, you can split the results into multiple files in Amazon S3, which reduces the time spent in the writing phase hence better performance and you can even use compression techniques like parquet.

    What you are trying to do is not what unload is meant for. One solution would be to write some kind of post processor which will merge the files after the write is finished. Maybe using the lambda function which is triggered on S3 write.