Search code examples
azureazure-pipelinestimeoutconnectivity

Are Azure Pipelines designed to support long-running jobs (running for days, potentially weeks)?


I have a long-running Java/Gradle process and an Azure Pipelines job to run it.

It's perfectly fine and expected for the process to run for several days, potentially over a week. The Azure Pipelines job is run on a self-hosted agent (to rule out any timeout issues) and the timeout is set to 0, which in theory means that the job can run forever.

Sometimes the Azure Pipelines job fails after a day or two with an error message that says "We stopped hearing from agent". Even when this happens, the job may still be running, as evident when SSH-ing into the machine that hosts the agent.

When I discuss investigating these failures with DevOps, I often hear that Azure Pipelines is a CI tool that is not designed for long-running jobs. Is there evidence to support this claim? Does Microsoft commit to only support running jobs within a certain duration limit?

Based on the troubleshooting guide and timeout documentation page referenced above, there's a duration limit applicable to Microsoft-hosted agents, but I fail to see anything similar for self-hosted agents.


Solution

  • Agree with @Dianel Mann.

    It's not common to run long-time jobs, but as per doc, it should be supported.

    stopped hearing from agent could be caused by network problem on the agent, or agent issue due to high cpu, storage, ram...etc. You can check the agent diagnostic log to troubleshoot.

    enter image description here