I am using azure blob storage to store data and feeding this data to Autoloader using mount. I was looking for a way to allow Autoloader to load a new file from any mount. Let's say I have these folders in my mount:
mnt/
├─ blob_container_1
├─ blob_container_2
When I use .load('/mnt/') no new files are detected. But when I consider folders individually then it works fine like .load('/mnt/blob_container_1')
I want to load files from both mount paths using Autoloader (running continuously).
You can use the path for providing prefix patterns, for example:
df = spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", <format>) \
.schema(schema) \
.load("<base_path>/*/files")
For example, if you would like to parse only png files within a directory that contains files with different suffixes, you can do:
df = spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "binaryFile") \
.option("pathGlobfilter", "*.png") \
.load(<base_path>)