Is it possible to keep track of S3 file uploading time with Databricks autoloader? Looks like Autoloader would add columns for the file name and processing time but in our user case we would need to know the order the files are uploaded to S3.
When you load the data, you can query the _metadata
column (or specific attribute inside it) - it includes file_modification_time
field that represents time of last file modification (that should match upload time).
Just do:
df.select("*", "_metadata.file_modification_time")
to get access to that field. See doc for details.