Search code examples
sqlhiveprestoamazon-athenatrino

Subquery in partitioned Athena tables


I am using partitions in Athena. I have a partition called snapshot, and when I call a query as such:

select * from mytable where snapshot = '2020-06-25'

Then, as expected only the specified partition is scanned and my query is fast. However, if I use a subquery which returns a single date, it is slooow:

select * from mytable where snapshot = (select '2020-06-25')

The above actually scans all partitions and not only the specified date, and results in very low performance.

My question is can I use a subquery to specify partitions and increase performance. I need to use a subsquery to add some custom logic which returns a date based on some criteria.


Solution

  • Edit:

    Trino 356 is able to inline such queries, see https://github.com/trinodb/trino/issues/4231#issuecomment-845733371

    Older answer:

    Presto still does not inline trivial subquery like (select '2020-06-25'). This is tracked by https://github.com/trinodb/trino/issues/4231. Thus, you should not expect Athena to inline, as it's based on Presto .172.

    I need to use a subsquery to add some custom logic which returns a date based on some criteria.

    If your query is going to be more sophisticated, not a constant expression, it will not be inlined anyway. If snapshot is a partition key, then you could leverage a recently added feature -- dynamic partition pruning. Read more at https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html. This of course assumes you can choose Presto version.

    If you are constraint to Athena, your only option is to evaluate the subquery outside of the main query (separately), and pass it back to the main query as a constant (e.g. literal).