I'm trying to access _metadata to get file modification time using the following instructions: https://docs.databricks.com/en/ingestion/file-metadata-column.html
Here is my code:
df = spark.read \
.format('com.databricks.spark.xml') \
.options(rowTag='TAG2') \
.options(nullValue='') \
.load(xmlFile) \
.select("*", "_metadata")
This works when I load csv file, but doesn't work with XML file. I get the error stating that there is no such column.
I am sure that the code loading XML contents works well.
Is this feature just not supported with XML files or am I doing something wrong?
I used slightly different approach since I decided to go with autoloader to ingest the files. I followed this example and read the files as binary and then convert them to xml https://docs.databricks.com/en/_extras/notebooks/source/kb/streaming/streaming-xml-example.html
I could access metadata without any issues.