Search code examples
airflowdatabricksazure-databricksdatabricks-workflows

How to trigger azure Databricks notebook from Apache Airflow


I have created some ETL in Azure data bricks notebook. Now trying to execute that notebook from the airflow-1.10.10.

If anyone can help it would be great.

Thanks In Advance.


Solution

  • Airflow includes native integration with Databricks, that provides 2 operators: DatabricksRunNowOperator & DatabricksSubmitRunOperator (package name is different depending on the version of Airflow. There is also an example of how it could be used.

    You will need to create a connection with name databricks_default with login parameters that will be used to schedule your job. In simplest case, for job you just need to provide a definition of the cluster, and notebook specification (at least path to notebook to run), something like this:

        notebook_task_params = {
            'new_cluster': new_cluster,
            'notebook_task': {
                'notebook_path': '/Users/airflow@example.com/PrepareData',
            },
        }
        # Example of using the JSON parameter to initialize the operator.
        notebook_task = DatabricksSubmitRunOperator(
            task_id='notebook_task',
            json=notebook_task_params
        )
    

    P.S. There is an old blog post with announcement of this integration.