i have a structured streaming app set up which is monitoring a folder in blob storage for new files and does the processing on them. It works well and i can monitor and cluster health, see the incoming records, output records etc. etc. But i really want to see if there is any log which says file name that got processing, or x number of records from this file gets processed.
any pointers will be helpful.
The file names that were processed are saved in the stream's configured checkpoint such .option("checkpointLocation", "dbfs://checkpointPath")
.
For monitoring how many input rows were actually processed by the stream, look into StreamingQueryListener.