Databricks pyspark readstream to read data from azure blob storage which is written in partitioned structure

I have a process which fetch data from RDBMS source and writes to Azure blob storage the data is written in partitioned structure like,

Storage account
   |_ container
      |_ data-load
         |_ updateddate_p=yyyymmdd
            |_timestamp_p=16382882
               |_data-file.orc
            |_timestamp_p=16382885
               |_data-file-1.orc

Now in Databricks, I mount the Azure storage (classic) to the cluster and use readstream to read the orc file data as stream.

Is there a way to read the orc data with the partitioned info.

So the stream dataframe has the partitioned information when reading.

So when using the writestream to write the data, I should be able to fetch the updateddate_p and timestamp_p in the readstream dataframe with the values.

Solution

Added the partition column part of the schema

 {
            "metadata": {},
            "name": "updateddate_p",
            "nullable": true,
            "type": "integer"
        },
        {
            "metadata": {},
            "name": "timestamp_p",
            "nullable": true,
            "type": "long"
        },

Used the read stream as such, and the partition loaded as expected

with open(f'custom-scehma.json', 'r') as f:
    schema_definition= T.StructType.fromJson(json.loads(f.read()))


data_readstream = ( 
    spark.readStream
    .format("orc")
    .schema(schema_definition)
    .option("pathGlobFilter", "*.orc")
    .load(f'{mounted_input_data_path_loc}/') 
  )

# write stream used partitionBy(column1,column2..

PySpark Reference Documentation