Search code examples
azureazure-data-factory

In Azure Data Factory, can I pass data from dataset as input to the Azure Function?


I am able to copy data between two 'Azure Cosmos DB for MongoDB' datasets using Azure Data Factory pipeline.

Now, I would like to transform the data using Azure Function before the data is pushed to the destination dataset. Is this possible in Azure Data Factory? If so, is there a documentation that I can refer to?

I thought this should have been a supported use case. Each record from the source dataset, considering it is in JSON format could be forwarded as-is in the request body to the HTTP triggered function, and the JSON response of this function would be available for use later in the pipeline.

I tried setting the body of the HTTP triggered function as the @activity('Copy Activity').output but the 'Copy Activity' output contains the metadata about the activity execution, not the dataset records that I am looking for. I also gave a try to dataflows but could not get started as 'Azure Cosmos DB for Mongo DB' is not a supported data store. There is not much information that can help with my scenario in the Azure Function activity documentation page.


Solution

  • In general, to pass the dataset from ADF to Azure function, we can use lookup activity. Lookup will give an array of objects as the output.

    But in this case, the source dataset(Azure Cosmos DB for MongoDB) doesn't support lookup activity. As per the documentation, it only supports copy activity. You can raise a feature request here to add it for lookup.

    As a workaround, you can try the below approach if you have access ADLS or Blob storage accounts.

    • First copy the Azure Cosmos DB for MongoDB data to Blob or ADLS JSON file using copy activity.
    • Now, Use lookup activity on the JSON file. Based on the lookup activity output JSON, use the dynamic expression @activity('lookup1').output.value or @activity('lookup1').output.value[0] to get the required JSON.
    • After lookup, pass this JSON to your Azure function as per your requirement.

    Also, you can get the JSON of the Azure Cosmos DB for MongoDB dataset, using this REST API.

    GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DocumentDB/databaseAccounts/{accountName}/mongodbDatabases/{databaseName}/collections/{collectionName}?api-version=2021-07-01-preview
    
    • First, get the bearer token using one Web activity in ADF with Service principal.
    • After that, use that bearer token and the above REST API in another Web activity to get the MongoDB collection under an Azure Cosmos DB database.
    • You will get a JSON as web activity response like this.
    • Identify your required JSON in that and pass it your Azure function.

    NOTE: These method only works when your source data size is less than 4 MB. Both Lookup activity and web activity output limitation is 4 MB (5000 rows) only. If your data is more than that, then its better to connect the Source in the Azure function code itself rathen than passing it from ADF dataset.