Search code examples
azureazure-devopsazure-devops-extensions

Azure ARM - Custom Script Extension - Failing randomly


I have developed an Azure ARM template to deploy an Ubuntu Linux machine that once provisioned a bash script will run to install a particular software. The software involves downloading some packages as well as pass an input parameter from the user in order to complete the configuration. The issue I am facing is that the script extension seems to work intermittently. I deployed it successfully once, and now it keeps failing all the time. Here is the error it returns after a few seconds that the custom script starts executing:

    {
  "code": "DeploymentFailed",
  "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",
  "details": [
    {
      "code": "Conflict",
      "message": "{\r\n  \"status\": \"Failed\",\r\n  \"error\": {\r\n    \"code\": \"ResourceDeploymentFailure\",\r\n    \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n    \"details\": [\r\n      {\r\n        \"code\": \"VMExtensionProvisioningError\",\r\n        \"message\": \"VM has reported a failure when processing extension 'metaport-onboard'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=1\\n[stdout]\\nReading package lists...\\nBuilding dependency tree...\\nReading state information...\\nsoftware-properties-common is already the newest version (0.96.24.32.14).\\nsoftware-properties-common set to manually installed.\\nThe following packages were automatically installed and are no longer required:\\n  grub-pc-bin linux-headers-4.15.0-121\\nUse 'sudo apt autoremove' to remove them.\\n0 upgraded, 0 newly installed, 0 to remove and 18 not upgraded.\\nReading package lists...\\nBuilding dependency tree...\\nReading state information...\\nSome packages could not be installed. This may mean that you have\\nrequested an impossible situation or if you are using the unstable\\ndistribution that some required packages have not yet been created\\nor been moved out of Incoming.\\nThe following information may help to resolve the situation:\\n\\nThe following packages have unmet dependencies:\\n python3-pip : Depends: python3-distutils but it is not installable\\n               Recommends: build-essential but it is not installable\\n               Recommends: python3-dev (>= 3.2) but it is not installable\\n               Recommends: python3-setuptools but it is not installable\\n               Recommends: python3-wheel but it is not installable\\n\\n[stderr]\\n+ sudo apt-get -qq -y update\\n+ sudo apt-get -q -y install software-properties-common\\n+ sudo apt-get -q -y install python3-pip\\nE: Unable to correct problems, you have held broken packages.\\nNo passwd entry for user 'mpadmin'\\n\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \"\r\n      }\r\n    ]\r\n  }\r\n}"
    }
  ]
}

Below is the portion of the template where I defined the extension

    {
  "type": "Microsoft.Compute/virtualMachines",
  "name": "[variables('vmName')]",
  "apiVersion": "2019-12-01",
  "location": "[variables('location')]",
  "dependsOn": [
    "[resourceId('Microsoft.Network/networkInterfaces/', variables('nicName'))]",
    "[resourceId('Microsoft.Network/virtualNetworks', parameters('virtualNetworkName'))]",
    "[resourceId('Microsoft.Network/natGateways', variables('natGatewayName'))]",
    "[resourceId('Microsoft.Network/networkSecurityGroups', variables('networkSecurityGroupName'))]"
  ],
  "properties": {
    "hardwareProfile": {
      "vmSize": "[parameters('virtualMachineSize')]"
    },
    "osProfile": {
      "computerName": "[variables('vmName')]",
      "adminUsername": "[parameters('adminUsername')]",
      "adminPassword": "[parameters('adminPasswordOrKey')]",
      "linuxConfiguration": "[if(equals(parameters('authenticationType'), 'password'), json('null'), variables('linuxConfiguration'))]"
    },
    "storageProfile": {
      "imageReference": {
        "publisher": "[variables('imagePublisher')]",
        "offer": "[variables('imageOffer')]",
        "sku": "[variables('imageSKU')]",
        "version": "[variables('imageVersion')]"
      },
      "osDisk": {
        "name": "[concat(variables('vmName'), '_OSDisk')]",
        "caching": "ReadWrite",
        "createOption": "FromImage",
        "managedDisk": {
          "storageAccountType": "[variables('storageAccountType')]"
        }
      }
    },
      "networkProfile": {
        "networkInterfaces": [
          {
            "id": "[resourceId('Microsoft.Network/networkInterfaces',variables('nicName'))]"
          }
        ]
      }
    },
    "resources": [
          {
          "name": "metaport-onboard",
          "type": "extensions",
          "apiVersion": "2019-03-01",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Compute/virtualMachines/', variables('vmName'))]",
            "[resourceId('Microsoft.Network/networkInterfaces',variables('nicName'))]",
            "[resourceId('Microsoft.Network/virtualNetworks', parameters('virtualNetworkName'))]",
            "[resourceId('Microsoft.Network/natGateways', variables('natGatewayName'))]",
            "[resourceId('Microsoft.Network/networkSecurityGroups', variables('networkSecurityGroupName'))]"
          ],
          "properties": {
            "publisher": "Microsoft.Azure.Extensions",
            "type": "CustomScript",
            "typeHandlerVersion": "2.1",
            "autoUpgradeMinorVersion": true,
            "settings": {
              "fileUris": [
                "https://raw.githubusercontent.com/willguibr/azure/main/Latest/MetaPort-Standalone-NATGW-v1.0/install_metaport.sh"
                ]
              },
            "protectedSettings": {
              "commandToExecute": "[concat('sh install_metaport.sh ', parameters('metaTokenCode'))]"
              }
            }
          }
        ]
      }
    ]
  }

The full template package is here.

Anyone have any idea on how to prevent this issue or implement any correction that may be necessary?


Solution

  • Well, this clearly says: script exited with code 1. this means that the script itself fails. so you need to login to the vm and take a look at the extension logs at the c:\windowsazure\packages\logs (or something like that) and figure out what went wrong and wrap it with some try\catch logic. also, consider propagating errors to the console, so you can actually see them in the logs.