python azure databricks azure-databricks

Azure Databricks python command to show current cluster config

I am currently optimizing our ETL process, and would like to be able to see the existing cluster configuration used when processing data. This way, I can track over time which worker node sizes I should use.

Is there a command to return the cluster worker # and sizes in python so I can write as a dataframe?

Solution

You can get this information by calling Cluster Get REST API - it will return JSON including the number of workers, node types, etc. Something like this:

import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = "your_PAT_token"
cluster_id = ctx.tags().get("clusterId").get()

response = requests.get(
    f'https://{host_name}/api/2.0/clusters/get?cluster_id={cluster_id}',
    headers={'Authorization': f'Bearer {host_token}'}
  ).json()
num_workers = response['num_workers']

P.S. if you have non-notebook job, then PAT token may not be available, but you can generate your token, and put it there instead