Run script on Azure VM Scale Set when deprovisioning

I have a VMSS in Azure that manually scales up and down based on certain commands in DevOps Pipelines. I already have a script that executes when it is provisioned and that works fine. I want to also execute a script when the VM is deprovisioned, shut down, or deleted.

If it were based on shutdown, I could probably do it with a scheduled task, but it seems like when the Scale Set scales down it essentially pulls the plugs on the machine. It doesn't look like it would wait for a scheduled task to complete, but I can't find any documentation on this.

Since I do manually control the steps to scale down, I can execute commands beforehand. Ideally I would use something like Run Command (I'm not even sure if that is possible with scale set VMs), but the VMs in the scale set are not publicly accessible.

Solution

The documentation at https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-terminate-notification suggests the following (assuming that you are standing up the scale set in the portal):

Go to Virtual machine scale sets.

Select + Add to create a new scale set.

Go to the Management tab.

Locate the Instance termination section.

For Instance termination notification, select On.

For Termination delay (minutes), set the desired default timeout.

When you are done creating the new scale set, select Review + create button.

Once enabled, a Scheduled Events notification API is available at http://169.254.169.254/metadata/scheduledevents?api-version=2019-01-01 (if your VM is VNet-enabled, which it should be), which you poll with frequency shorter than the termination delay (by enough to give you time to spin down the agent). That API will return a result like the following to tell you that you need to spin down:

{
    "DocumentIncarnation": {IncarnationID},
    "Events": [
        {
            "EventId": {eventID},
            "EventType": "Terminate",
            "ResourceType": "VirtualMachine",
            "Resources": [{resourceName}],
            "EventStatus": "Scheduled",
            "NotBefore": {timeInUTC},
        }
    ]
}

Now, with ALL of that in mind, proper termination of an Azure Pipelines SHOULD be baked in to the agent - do you have the setting enabled on the scale set agent pool to retain a problematic agent? If so, that could leave VMs around. If not, perhaps a ticket with Microsoft.

I don't think your approach is wrong, rather, I don't think you should HAVE to resort to this.