Search code examples
azure-data-factoryazure-databricks

Azure Data Factory using existing cluster in Databricks


I have created a pipeline in Azure Data Factory. I created a Databricks workspace, notebook (with some code), and a cluster. I created the connection from ADF to DB. I tested the connection. All lights are green. I published the ADF pipeline.

When I trigger the job, it says SUCCESS. But nothing happens in Databricks. No job is created in DB. The code in the notebook cell is apparently not executed. (I know this because the code prints the current time.)

Has anyone done this successfully?

To be clear, I want Data Factory to use an existing cluster in Databricks, not create a new one. I have named the cluster in the pipeline setup params.


Solution

  • Solved. The problem was that the notebook (containing my code) was within my User notebook folder. Data Factory did not have permission to see/use my notebook. I created the same notebook within the Shared folder and everything works fine.

    I will point out that ADF should issue an error/warning if the named notebook cannot be seen or used. The ADF pipeline verified fine, reported a successful run, but just failed silently.