I have interesting issue with the latest version of on-premise TFS (2018 Version 16.122.27102.1). I have a release process that includes a step for "Deploy TestAgent on localhost". Looks like this:
Normally work great, worked great when I was using TFS 2012, but recently we upgraded to 2018 and now when this process runs on a certain build agent(Agent-19 only), occasionally I get a strange failure:
Operating system is shutting down for computer 'XXX_TESTING'
The agent: Agent-19 lost communication with the server. Verify the machine is running and has a healthy network connection. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610
Strange, the restart seem to be generated from the same service account as the TFS Build Agent uses:
Not whole lot of information there, the TFS build worker log doesn't have to much information either:
[2018-03-01 00:46:35Z INFO ProcessInvoker] Starting process:
[2018-03-01 00:46:35Z INFO ProcessInvoker] File name: 'C:\TFS Agent\externals\vstshost\LegacyVSTSPowerShellHost.exe'
[2018-03-01 00:46:35Z INFO ProcessInvoker] Arguments: ''
[2018-03-01 00:46:35Z INFO ProcessInvoker] Working directory: 'C:\TFS Agent_work_tasks\DeployVisualStudioTestAgent_52a38a6a-1517-41d7-96cc-73ee0c60d2b6\1.0.42'
[2018-03-01 00:46:35Z INFO ProcessInvoker] Require exit code zero: 'False'
[2018-03-01 00:46:35Z INFO ProcessInvoker] Encoding web name: ; code page: ''
[2018-03-01 00:46:35Z INFO ProcessInvoker] Force kill process on cancellation: 'False'
[2018-03-01 00:46:35Z INFO ProcessInvoker] Process started with process id 14620, waiting for process exit.
[2018-03-01 00:46:35Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1.
[2018-03-01 00:48:11Z INFO Worker] Cancellation/Shutdown message received.
[2018-03-01 00:48:11Z INFO HostContext] Agent will be shutdown for OperatingSystemShutdown
[2018-03-01 00:48:11Z INFO StepsRunner] Cancel current running step.
So, system shuts down, agent stops, tests don't run, but why, no idea... So I re-image the entire server with a copy of one of my other build server, re-install the build agent, but the issue persists, and it only occurs on that build server, only on that step, and only "sometimes" (I haven't identified a pattern, but generally during the nightly run at 6:30PM CST).
How do I diagnose this? Is there a place that would tell me "why" a system restarted? This doesn't really give me a whole lot of information... I searched around and I don't see anyone else with an issue of this nature.
First of all, the Deploy test Agent step is deprecated, it has been replaced with the new agent infrastructure and the VS Test 2.0 runners. See:
The install test agent step was meant to install the test agent to a different server/VM, not on the agent.
The build/release agent will be alive to monitor the test agent coming back to life to run the tests. Reasons, why the agent may trigger a reboot, can be found here: