Search code examples
apiapache-sparkcluster-computingdatabricksjobs

Databricks Job API create job with single node cluster


I am trying to figure out why I get the following error, when I use the Databricks Job API.

{ "error_code": "INVALID_PARAMETER_VALUE", "message": "Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size" }

What I did:

  1. I created a Job running on a single node cluster using the Databricks UI.
  2. I copy& pasted the job config json from the UI.
  3. I deleted my job and tried to recreate it by sending a POST using the Job API with the copied json that looks like this:
{
    "new_cluster": {
        "spark_version": "7.5.x-scala2.12",
        "spark_conf": {
            "spark.master": "local[*]",
            "spark.databricks.cluster.profile": "singleNode"
        },
        "azure_attributes": {
            "availability": "ON_DEMAND_AZURE",
            "first_on_demand": 1,
            "spot_bid_max_price": -1
        },
        "node_type_id": "Standard_DS3_v2",
        "driver_node_type_id": "Standard_DS3_v2",
        "custom_tags": {
            "ResourceClass": "SingleNode"
        },
        "enable_elastic_disk": true
    },
    "libraries": [
        {
            "pypi": {
                "package": "koalas==1.5.0"
            }
        }
    ],
    "notebook_task": {
        "notebook_path": "/pathtoNotebook/TheNotebook",
        "base_parameters": {
            "param1": "test"
           
        }
    },
    "email_notifications": {},
    "name": " jobName",
    "max_concurrent_runs": 1
}

The documentation of the API does not help (can't find anything about settings.cluster_spec.new_cluster.size). The json is copied from the UI, so I guess it should be correct.

Thanks for your help.


Solution

  • Source: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#--create

    To create a Single Node cluster, include the spark_conf and custom_tags entries shown in the example and set num_workers to 0.

    {
      "cluster_name": "single-node-cluster",
      "spark_version": "7.6.x-scala2.12",
      "node_type_id": "Standard_DS3_v2",
      "num_workers": 0,
      "spark_conf": {
        "spark.databricks.cluster.profile": "singleNode",
        "spark.master": "local[*]"
      },
      "custom_tags": {
        "ResourceClass": "SingleNode"
      }
    }