Azure ML SDK DataReference - File Pattern - MANY files

I’m building out a pipeline that should execute and train fairly frequently. I’m following this: https://learn.microsoft.com/en-us/azure/machine-learning/service/how-to-create-your-first-pipeline

Anyways, I’ve got a stream analytics job dumping telemetry into .json files on blob storage (soon to be adls gen2). Anyways, I want to find all .json files and use all of those files to train with. I could possibly use just new .json files as well (interesting option honestly).

Currently I just have the store mounted to a data lake and available; and it just iterates the mount for the data files and loads them up.

How can I use data references for this instead?
What does data references do for me that mounting time stamped data does not? a. From an audit perspective, I have version control, execution time and time stamped read only data. Albeit, doing a replay on this would require additional coding, but is do-able.

Solution

You could pass pointer to folder as an input parameter for the pipeline, and then your step can mount the folder to iterate over the json files.