Search code examples
azure-data-factoryazure-blob-storagepowerquery

How to parameterize a dataset so I can use multiple excel files in a blob folder to pass to PowerQuery one by one?


I am in Data factory and I need help to parameterize a dataset so I can use multiple files in a blob folder to process in PowerQuery and then send to an AzureSQL table for sink(storage). I do not want to make 100 datasets if I want to process the 100 excel files in the same way using PowerQuery.

I have successfully executed a pipeline as follows:

Get Metadata activity to read all files in a blob folder (.xlsx), then use a for each loop to get each file and inside that an activity to copy it to an Azure SQL DB.

Now I want to see if I can do following or similar:

Get a list of files in a blob folder or get full blob addresses of files, and then use a for loop to try to pass one file/file address (per loop turn) and process it through PowerQuery.

Is this even possible right now? Really stuck with this. I want to process data in PowerQuery before loading it to a sink basically.

Alternative ideas encouraged!


Solution

  • You can pass dynamically the filename in power query dataset also. Below are the detailed steps.

    • Take the Get metadata activity and get the list of file names from that activity.

    • Then take the for-each activity and give thegetmetadata activity output array to for-each activity

    • Inside for-each activity, add the power query there. Open the power query source dataset and give the dynamic content expression for the dataset filename as @item().name.

    gif1

    • Give the sink name in the sink settings of power query activity.

    This way, you can achieve the requirement of using the same power query pipeline for all files.