Can I use Athena / Presto to sort a table before writing?

I want to archive my logs into the Parquet format. Before writing the table, I want to sort it by a column c so that each Parquet file will only have a small range of c. That will allow Athena / Presto to efficiently scan the table when a query includes a WHERE clause on column c (via predicate pushdown).

However, it's unclear to me whether I can use Athena or Presto to sort the entire table. I need a distributed sort - not one that takes place on a single node - because the dataset is too big to fit on a single node. Is such a sort possible? If so, how to I invoke it?

Solution

Presto supports distributed sort since 0.206. Athena is currently based on Presto 0.172 and I don't know if they backported this feature.

So your choices are

grab latest Presto @ https://trino.io/download.html
get easy to deploy Presto on AWS from Starburst (https://www.starburstdata.com/presto-aws-cloud/) (disclaimer: I am from Starburst)
use Presto bundled on EMR (I don't know how it comes configured, but probably Distributed Sort is still enabled by default)