Search code examples
pythonmavendatabricksazure-databricksdbutils

Installing Maven library on Databricks via Python commands and dbutils


On Databricks I would like to install a Maven library through commands in a Python Notebook if its not already installed.

If it were a Python PyPI library I would do something like the following:

# Get a list of all available library 
library_name_list = dbutils.library.list()
# Suppose the library of interest was "scikit-learn"
if "scikit-learn" not in library_name_list:
  # Install the library 
  dbutils.library.installPyPI("scikit-learn")

How can I do the same for a Maven library "com.microsoft.azure.kusto:spark-kusto-connector:2.0.0" i.e. check if it is already installed, if not then install?

I can install the Maven library using the UI by going to "Clusters" -> "Libraries" -> "Install New" -> "Maven". But I would like to do it programmatically through a script.


Solution

  • Note: Library utilities (dbutils.library.installPyPI("")) allow you to install Python libraries and create an environment scoped to a notebook session.

    Here are the steps to programmatically way to install libraries from the maven repository:

    You can use Databricks CLI to install maven libraries in Azure Databricks.

    Step1: From maven coordinates, you can go to Maven Repository and pick the version which you are looking for and note the dependency (groupId, artifactId, and Version).

    enter image description here

    Step2 Get the cluster-ID using databricks CLI.

    To get the cluster-ID: databricks clusters list

    enter image description here

    Step3: Use the below Databricks CLI command to install 'com.microsoft.azure.kusto:spark-kusto-connector:2.0.0' in Databricks.

    Syntax: databricks libraries install --cluster-id "Cluster ID" --maven-coordinates "GroupId:ArtifactId:Version" (i.e.org.jsoup:jsoup:1.7.2)

    To install the maven library using databricks CLI: databricks libraries install --cluster-id "1013-095611-mazes551" --maven-coordinates "com.microsoft.azure.kusto:spark-kusto-connector:2.0.0"

    To check the install libraries on the cluster: databricks libraries list --cluster-id "1013-095611-mazes551"

    enter image description here

    For different methods to install packages in Azure Databricks, refer: How to install a library on a databricks cluster using some command in the notebook?