I manage some data in AWS, and there are some parquet files in a S3 bucket. Everyday, new files will added to this bucket, and I would like to get the data in latest file by using Athena.
I want to know how to designate the latest file path in Athena Query. Is it possible to recognize the latest file from path of each parquet file?
Presto DB (now Trino) is the engine on which Athena is based. Support for querying the file timestamp has been recently added, but it's likely to take a while before it's available on Athena (probably years).
In the meantime, if your parquet files include a timestamp in the name you could do something like:
select * from mydb
where "$path" in
(
select "$path"
from my db
order by "$path" desc
limit 1
)