Search code examples
azureazure-data-factoryazure-databrickscost-managementdatabricks-workflows

How to create a Spot instance - job cluster using Azure Data Factory(ADF) - Linked service


I have a ADF pipeline with a Databricks activity.

The activity creates a new job cluster every time and I have added all the required Spark configurations to a corresponding linked service.

Now with Databricks offering Spot Instances, I'd like to create my new clusters with Spot configurations within Databricks.

I tried to find the help from the LinkedService docs but no luck!

How can I do this using ADF?

Cheers!!!


Solution

  • I have found another workaround to enable the ADF Databricks Linked Service to create job clusters with spot instances. As Alex Ott mentioned, the azure_attribute cluster property isn't supported by the Databricks Linked Service interface.

    Instead, I ended up creating a cluster policy that enforces spot instances:

    {
      "azure_attributes.availability": {
        "type": "fixed",
        "value": "SPOT_WITH_FALLBACK_AZURE",
        "hidden": true
      }
    }
    

    You can add to that policy if you want to augment the other properties of the azure_attributes object. Also, make sure you set the policy permissions for the appropriate groups/users.

    After creating the policy you will need to retrieve the policy id. I used a REST call to the 2.0/policies/clusters/list endpoint to get that value.

    From there you can do what Alex Ott suggested and create the linked service using the dynamic json option and add the policyId property with the appropriate policy id to the typeProperties object:

    "typeProperties": {
      "domain": "Your Domain",
      "newClusterNodeType": "@linkedService().ClusterNodeType",
      "newClusterNumOfWorker": "@linkedService().NumWorkers",
      "newClusterVersion": "7.3.x-scala2.12",
      "newClusterInitScripts": [],
      "newClusterDriverNodeType": "@linkedService().DriverNodeType",
      "policyId": "Your policy id",
    }
    

    Now when you invoke your ADF pipeline it will create a job cluster using the cluster policy to restrict the availability property of azure_attributes to whatever you specified.