Search code examples
databricksdatabricks-autoloader

Can Databricks Autoloader Keep Track of File Uploading Time


Is it possible to keep track of S3 file uploading time with Databricks autoloader? Looks like Autoloader would add columns for the file name and processing time but in our user case we would need to know the order the files are uploaded to S3.


Solution

  • When you load the data, you can query the _metadata column (or specific attribute inside it) - it includes file_modification_time field that represents time of last file modification (that should match upload time).

    Just do:

    df.select("*", "_metadata.file_modification_time")
    

    to get access to that field. See doc for details.