Search code examples
azureazure-data-factory

How to use Custom Activity (batch) in Azure Data Flow in ADF?


I do have a Python script which I want to use in Azure datafactory.

I have created a Azure Data Flow which reads parquets from the Storage Account.

I have a python script which reads parquet and transforms from the storage account.

Current Pipeline:

I have this in the dataflow in ADF: enter image description here

And the Custom Activity in the Pipeline: enter image description here

Expected Result:

Is there a way to use this Custom batch Activity in the Dataflow ? because I have the Python script running as custom: to read parquet file and transforms the data, now How do I ingest data to the sink ( because sink is only in the Dataflow)

Case1: Either Read through Custom Activity and then use it in Dataflow for further Sort and Sink.

Case2: Use DataFlow source to Read file, Use custom for transformations and Dataflow for sink.

How do I make use of the Python script(pandas) in here, with the DataFlow features.


Solution

  • Custom activity is a separate activity and cannot be used within datasets or dataflows. So what I would suggest you to use Azure blob storage as stage layer between custom activity and dataflow/Copy activity.

    you can split the output of the custom activity into an Azure blob storage in the form of file and that file you can use as a source within your copy activity or dataflow activity.