Search code examples
powershellazure-devopsazure-devops-extensions

Child process of an Azure DevOps task is killed when the job stops


I have a custom PowerShell Azure DevOps task that creates a child process (for running another PowerShell script). I want the child process to run for a while - it's a background polling job. However, the child process is stopped when the job finishes.

This is all in the context of a self hosted agent, where the agent machine's continued persistence and uptime is more or less guaranteed. With AzDevOps agents in the cloud, I would not recommend this.

Anyway, I've isolated that behavior to a bare minimum example. The main task script goes:

$PollScriptPath = [System.IO.Path]::Combine($PSScriptRoot, "poll.ps1")
Start-Process -FilePath "cmd.exe" -ArgumentList "/c powershell.exe $PollScriptPath  >PollLog.txt 2>&1"

Poll.ps1 does nothing in the example, just sits there for 5 minutes:

Write-Host Hello
Start-Sleep -Seconds 300

The task executor is the legacy Powershell host; the relevant part in the task manifest goes: {"execution":{"PowerShell":{...}}}

To check, I've added a 30 second sleep after the Start-Process line; with that in place, I can see the powershell process running in the context of the agent account for as long as the PS host process is running. When the latter quits, the former quits too.

EDIT: moved the investigation details to the blog.


Solution

  • The other answer is valid in its own right, but here is, finally, the right answer to the question as posed.

    As of version 2.210.1, the Azure DevOps agent runs an instance of the process Agent.Worker.exe for every job. Its job shutdown logic contains a provision for locating and killing all processes that were spawned by the job. In order to identify those, the worker generates a GUID at startup and places a variable called VSTS_PROCESS_LOOKUP_ID with a value vsts_{GUID} into its environment block. Every time a process is started from the job, it inherits the parent environment.

    Come job shutdown time, the worker retrieves the list of processes and checks which ones have the environment variable VSTS_PROCESS_LOOKUP_ID with the right value, and kills those.

    So the trick to deliberately orphaning a process is spawning it while the VSTS_PROCESS_LOOKUP_ID of the spawning process has been reset or erased. For example, the following script in the Command Line task (v2+, assuming Windows) will run a Powershell script that will outlive the job:

    cd /d c:\
    set VSTS_PROCESS_LOOKUP_ID=
    start powershell.exe c:\Path\MyScript.ps1
    

    The cd line is necessary; otherwise, the work folder of the job will become the current folder of the background process and that will mess with the subsequent work of the agent; the agent would try to delete and recreate the work folder and fail. The start is necessary so that Powershell runs in a separate window; simply executing powershell.exe will make cmd.exe wait until the Powershell process quits, which defeats the purpose.

    Now, all this is dependent on implementation details of the agent and is therefore brittle. However, the logic seems to be shared between Windows and Linux/MacOS builds of the agent, and is generally applicable on those.


    One slightly less brittle approach might involve modifying the background process' access control list to deny access for the Worker process. I didn't investigate this to completion.