I have created some ETL in Azure data bricks notebook. Now trying to execute that notebook from the airflow-1.10.10.
If anyone can help it would be great.
Thanks In Advance.
Airflow includes native integration with Databricks, that provides 2 operators: DatabricksRunNowOperator
& DatabricksSubmitRunOperator
(package name is different depending on the version of Airflow. There is also an example of how it could be used.
You will need to create a connection with name databricks_default
with login parameters that will be used to schedule your job. In simplest case, for job you just need to provide a definition of the cluster, and notebook specification (at least path to notebook to run), something like this:
notebook_task_params = {
'new_cluster': new_cluster,
'notebook_task': {
'notebook_path': '/Users/airflow@example.com/PrepareData',
},
}
# Example of using the JSON parameter to initialize the operator.
notebook_task = DatabricksSubmitRunOperator(
task_id='notebook_task',
json=notebook_task_params
)
P.S. There is an old blog post with announcement of this integration.