Search code examples
amazon-s3amazon-emrpresto

What is the option for presto to map multiple row to single file in S3?


I am using EMR service with presto enable. Created one schema under that created one table with external_location option pointing to s3 bucket.

When I try to insert data into the table through presto-cli every time it generated new file in s3. Is there any option to stored multiple row in single file in s3.


Solution

  • Presto INSERT will always create new file(s), regardless of underlying storage. Moreover, S3 storage is write-once, there is no append. To have one file, you need to write one INSERT or CREATE TABLE .. AS query.

    In a single INSERT query you can insert multiple rows:

    INSERT INTO t (a,b,c) VALUES ('a', 'b', 'c'), ('a2', 'b2', 'c2'), ...;