Search code examples
pythonazuredatabricksazure-databricks

Azure Databricks python command to show current cluster config


I am currently optimizing our ETL process, and would like to be able to see the existing cluster configuration used when processing data. This way, I can track over time which worker node sizes I should use.

Is there a command to return the cluster worker # and sizes in python so I can write as a dataframe?


Solution

  • You can get this information by calling Cluster Get REST API - it will return JSON including the number of workers, node types, etc. Something like this:

    import requests
    ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
    host_name = ctx.tags().get("browserHostName").get()
    host_token = "your_PAT_token"
    cluster_id = ctx.tags().get("clusterId").get()
    
    response = requests.get(
        f'https://{host_name}/api/2.0/clusters/get?cluster_id={cluster_id}',
        headers={'Authorization': f'Bearer {host_token}'}
      ).json()
    num_workers = response['num_workers']
    

    P.S. if you have non-notebook job, then PAT token may not be available, but you can generate your token, and put it there instead