Search code examples
hivecompressionfile-formatamazon-athenapresto

How to create an uncompressed text file with Athena


I have some data in Athena and I want to join and create a new table. I can do that with as CTAS operation, and AWS provide a few examples.

I'm create output with the TEXTFILE format, and I don't want any compression. In this link

This example also specifies compression as SNAPPY. If omitted, GZIP is used.

Here's an example of my code:

CREATE TABLE "main_extended"
WITH (
    external_location = 's3://attr-dev/data-joiner/test-1/main_extended/',
    format = 'TEXTFILE',
    field_delimiter = ',')
AS
SELECT
    t1.*,
    t2.month
FROM main t1 left outer join period_month t2
ON t1.period = t2.period

How do I specify the compression option, and what value should that option take to 'turn off' compression for TEXTFILE format?


Solution

  • As of today this isn't supported with CTAS syntax - have tried multiple approaches, thoroughly reviewed documentation, and also enquired with AWS support.

    The recommended approach for creating tables that are backed by simple text files is to invoke a simple SELECT statement from Athena and specifying an output location, and then invoking a CREATE EXTERNAL TABLE type query which references that output location. The result is the same but the mechanism is a little different.