I am trying to find a way to obtain the last updated date and time for each partition. For example, if the 15-04-23 partition was created on the 15th of March and a new file was added to this partition on the 18th, I would expect the last updated date to be 18th March . Is there any way to get this information.
This query will give you the file modified time of the newest object in each partition:
select "$partition", max("$file_modified_time") as max_file_modified_time
from my_table
group by 1
order by 1
This query scans zero bytes, but depending on how many partitions there are in your table it could make a lot of S3 operations so it's not free.
The query leverages the $partition
metadata column which will be a string representation of the combined partition keys. You can also use the names of your partition keys directly. If your table has partition keys date
, category
the query could look like this:
select date, category, max("$file_modified_time") as max_file_modified_time
from my_table
group by 1, 2
order by 1, 2