I am able to copy data between two 'Azure Cosmos DB for MongoDB' datasets using Azure Data Factory pipeline.
Now, I would like to transform the data using Azure Function before the data is pushed to the destination dataset. Is this possible in Azure Data Factory? If so, is there a documentation that I can refer to?
I thought this should have been a supported use case. Each record from the source dataset, considering it is in JSON format could be forwarded as-is in the request body to the HTTP triggered function, and the JSON response of this function would be available for use later in the pipeline.
I tried setting the body of the HTTP triggered function as the @activity('Copy Activity').output
but the 'Copy Activity' output contains the metadata about the activity execution, not the dataset records that I am looking for. I also gave a try to dataflows but could not get started as 'Azure Cosmos DB for Mongo DB' is not a supported data store. There is not much information that can help with my scenario in the Azure Function activity documentation page.
In general, to pass the dataset from ADF to Azure function, we can use lookup activity. Lookup will give an array of objects as the output.
But in this case, the source dataset(Azure Cosmos DB for MongoDB
) doesn't support lookup activity. As per the documentation, it only supports copy activity. You can raise a feature request here to add it for lookup.
As a workaround, you can try the below approach if you have access ADLS or Blob storage accounts.
@activity('lookup1').output.value
or @activity('lookup1').output.value[0]
to get the required JSON.Also, you can get the JSON of the Azure Cosmos DB for MongoDB
dataset, using this REST API.
GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DocumentDB/databaseAccounts/{accountName}/mongodbDatabases/{databaseName}/collections/{collectionName}?api-version=2021-07-01-preview
NOTE: These method only works when your source data size is less than 4 MB. Both Lookup activity and web activity output limitation is 4 MB (5000 rows) only. If your data is more than that, then its better to connect the Source in the Azure function code itself rathen than passing it from ADF dataset.