Search code examples
pythonamazon-web-servicesamazon-s3aws-glueamazon-athena

Amazon s3: Get last updated date of partition in Athena/AWS Glue Catalog


I am trying to find a way to obtain the last updated date and time for each partition. For example, if the 15-04-23 partition was created on the 15th of March and a new file was added to this partition on the 18th, I would expect the last updated date to be 18th March . Is there any way to get this information.


Solution

  • This query will give you the file modified time of the newest object in each partition:

    select "$partition", max("$file_modified_time") as max_file_modified_time
    from my_table
    group by 1
    order by 1
    

    This query scans zero bytes, but depending on how many partitions there are in your table it could make a lot of S3 operations so it's not free.

    The query leverages the $partition metadata column which will be a string representation of the combined partition keys. You can also use the names of your partition keys directly. If your table has partition keys date, category the query could look like this:

    select date, category, max("$file_modified_time") as max_file_modified_time
    from my_table
    group by 1, 2
    order by 1, 2