I do have a Python script which I want to use in Azure datafactory.
I have created a Azure Data Flow which reads parquets from the Storage Account.
I have a python script which reads parquet and transforms from the storage account.
Current Pipeline:
I have this in the dataflow in ADF:
And the Custom Activity in the Pipeline:
Expected Result:
Is there a way to use this Custom batch Activity in the Dataflow ? because I have the Python script running as custom: to read parquet file and transforms the data, now How do I ingest data to the sink ( because sink is only in the Dataflow)
Case1: Either Read through Custom Activity and then use it in Dataflow for further Sort and Sink.
Case2: Use DataFlow source to Read file, Use custom for transformations and Dataflow for sink.
How do I make use of the Python script(pandas) in here, with the DataFlow features.
Custom activity is a separate activity and cannot be used within datasets or dataflows. So what I would suggest you to use Azure blob storage as stage layer between custom activity and dataflow/Copy activity.
you can split the output of the custom activity into an Azure blob storage in the form of file and that file you can use as a source within your copy activity or dataflow activity.