Search code examples
jupyter-notebookdatabricksazure-databricks

Using Databricks Connect


I'd like to edit Databricks notebooks locally using my favorite editor, and then use Databricks Connect to run the notebook remotely on a Databricks cluster that I usually access via the web interface.

Unfortunately, after searching the web for a couple days, I can't find detailed documentation on Databricks Connect.

I run databricks-connect configure, as suggested on the PyPI page above, but I'm not sure what some of the settings are. Could someone please walk me through this (like where to find these values in the web interface) or provide a link to proper documentation?

I know what some of the settings should be, but I'll include everything that comes up when running databricks-connect configure, for completeness and benefit of others.

Databricks Host
Databricks Token
Cluster ID (e.g., 0921-001415-jelly628)
Org ID (Azure-only, see ?o=orgId in URL)
Port (is it spark.databricks.service.port ?)

Also, and I think it's what I'm most interested in, do I need to make any changes in the notebook itself, such as define SparkContext or something? If so, with what configuration?

And how should I run it? After running databricks-connect configure, there doesn't seem any "magic" to be happening. When I run jupyter notebook, it still runs locally and doesn't seem to know to forward it to a remote cluster.

Update: If you'd like to think of something more concrete, in Databricks' web interface, dbutils is a predefined object. How do I refer to it when running a notebook remotely?


Solution

  • I had marked another person's reply as the answer, but that reply is gone now for some reason.

    For my purposes, the official user guide worked: https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html