How can I create a ForEach activity that:
Because currently, I'm doing it by adding a Notebook activity for each Notebook, and connecting them one after another. But this kind of working is not efficient, because when a new Notebook is created in Databricks, I must remember to update my Pipeline execution in Azure Synapse Data Factory.
Thanks.
You can use this REST API to get the Notebooks list in a cluster.
First, I tried to get the list using web activity but not able to do it. That's why I have used Databricks notebook(start_notebook
) to get the list of notebooks and then filtered required notebooks.
start_notebook
code:
import requests
import json
my_json = {"path": "/Users/< yours@mail.com >/Folder/"}
auth = {"Authorization": "Bearer <Access token>"}
response = requests.get('https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/workspace/list', json = my_json, headers=auth).json()
dbutils.notebook.exit(response)
Then In ADF Notebook activity output you can get the list of notebook as JSON array like below.
Now use filter activity to filter the start_notebook
from the above array.
I have used a parameter for the name.
Filter activity:
Items: @activity('Notebook1').output.runOutput.objects
Condition:@not(equals(last(split(string(item().path),'/')), pipeline().parameters.start))
Filter output array:
Give this output array to a ForEach as @activity("Filter1").output.Value
and inside forEach use Notebook activity(give @item().path
for Notebook path).