Search code examples
pysparkazure-functionsdatabricksazure-databricksdatabricks-workflows

Calling Databricks Python notebook in Azure function


I have a python Databricks notebook(pyspark) which does an aggregation based on the inputs provided to the notebook via parameters.

  1. Is it possible to run this notebook from the Azure function app.
  2. Can we pass the parameters to the notebook from the Azure function HTTP tigger. If so, kindly let me know the approach.
  3. Can we pass the databricks output to the Azure function via HTTP trigger.

Thank you.


Solution

  • Yes, it's possible to do that by using Databricks Jobs REST API. There are two ways of starting a job with notebook:

    1. You create a job inside Databricks that uses your notebook, and then you use run-now REST endpoint to trigger a job, passing parameters.
    2. You use runs submit REST endpoint to create a one time job providing full job specification.

    I personally would prefer 1st variant as it hides the things like cluster configuration, etc. from the Azure function, as job specification is done on Databricks.

    In both cases, the result of REST API call is the job run ID, that then could be used to check the status of the job run, and to retrieve the output of the job.