I want to automate these 3 steps:
Whether its a successful completion or unexpected exit will be monitored by a monitoring process which can be run as a daemon before the main process starts. In which case, we can trigger restart as a failsafe mechanism.
I'm using Azure cloud to run my process.
Several Options to choose from.
As for deprovisioning, you can run the Azure CLI inside the VM after your task is done to delete the VM. The easiest way to do this is with a system-assigned managed identity.
By default, when you delete a VM it only deletes the VM resource, not the networking and disk resources. You can change this default behavior when you create a VM, or update an existing VM, to delete specific resources along with the VM.
The cleanest method is to create the VM and all components associated with that VM in a single Resource Group (RG). That way when your script is complete you can just delete the RG and be done.
az group delete --name ExampleResourceGroup