Search code examples
azureazure-data-factoryazure-databricksazure-virtual-network

Azure Databricks Execution Fail - CLOUD_PROVIDER_LAUNCH_FAILURE


I'm using Azure DataFactory for my data ingestion and using an Azure Databricks notebook through ADF's Notebook activity.

The Notebook uses an existing instance pool of Standard DS3_V2 (2-5 nodes autoscaled) with 7.3LTS Spark Runtime version. The same Azure subscription is used by multiple teams for their respective data pipelines.

During the ADF pipeline execution, I'm facing a notebook activity failure frequently with the below error message

{
  "reason": {
    "code": "CLOUD_PROVIDER_LAUNCH_FAILURE",
    "type": "CLOUD_FAILURE",
    "parameters": {
      "azure_error_code": "SubnetIsFull",
      "azure_error_message": "Subnet /subscriptions/<Subscription>/resourceGroups/<RG>/providers/Microsoft.Network/virtualNetworks/<VN>/subnets/<subnet> with address prefix 10.237.35.128/26 does not have enough capacity for 2 IP addresses."
    }
  }
}

Can anyone explain what this error is and how I can reduce the occurrence of this? (The documents I found are not explanatory)


Solution

  • Looks like your data bricks has been created within a VNET see this link or this link. When this is done, the databricks instances are created within one of the subnets within this VNET. It seems that at the point of triggering, all the IPs within the subnet were already utilized. You cannot ad should not extend the IP space. Please do not attempt to change the existing VNET configuration as this will affect your databricks cluster. You have the following options.

    1. Check when less number of databricks instances are being instantiated and schedule your ADF during this time. You should be looking at distributing the execution across the time so we don't attempt to peak over the existing IPs in the subnet.
    2. Request your IT department to create a new VNET and subnet and create a new Databricks cluster in this VNET.