Search code examples
azure-cosmosdbazure-storageazure-data-factory

Azure DataFactory - Can we ordered CopyData source before ingestion


I have a case where I need to ingest CSV files into CosmosDb. So I have one DataSets to process the CSV, and another to prepare CosmosDb schema.

In the pipeline, I have a CopyData task mapping from CSV and then writing in Cosmos. In the CopyData Source parameter, I specify an Azure Blob Storage where CSV are stored.

Until now, there was no problem. Thing is, I now need to find a way to ensure that blobs are ingested like an alphabeticaly ordered files array (based on fileName).

Is there a way ?


Solution

  • It's hard to sort by fileNames in ADF.

    One way to achieve:

    Save all your fileNames in a csv file, then use Sort activity in Data Flow and overwrite this file. Finally, use Lookup and For Each activity to copy blobs to Cosmos DB.

    Another way:

    Pass childItems of Get Metadata activity's output to Azure Function. Then sort fileNames in Azure Function. Finally, loop output of Function by For Each activity and copy to Cosmos DB.