We have a TFS 2017 build agent executing a Visual Studio Test task to execute our unit tests. This has worked fine for several years, but all of a sudden - without any code changes - the task gets stuck.
All the tests have finished running, we see summary information, and it will sit at what appears to be the place where it would normally publish the results... but then nothing happens. We've waited 12+ hours for it to finish. This step normally takes about 90 minutes.
I've confirmed that the TRX file is being created. It's about 4MB in size. We're running a bit over 3000 unit tests.
I've also tried disabling code coverage and attachments upload inside the test task, but it doesn't appear to make a difference.
Below is a screen cap of the log output when the step is stuck.
Lastly, we have lots of other projects on this server whose tests run / publish fine, as well as TFS Releases for this same build that also run tests (integration/system tests) which work without issue.
UPDATE: We ran this build on a different build server, and it published tests correctly. So this means there is something wrong with this specific build server...
UPDATE 2: So I'm not longer sure what is happening here. The original build server we were having issues on is now working fine with no changes whatsoever. Just started working again. The other build server was working, and then stopped. Same issue. I broke up the 3000+ tests into two steps, roughly 50/50, and that worked a couple of times, but now does not. So this does not appear to be server specific, nor does it appear to be related to the quantity of tests. Debug logging offers nothing useful, as everything seems fine right up until it just stops doing anything after generating the TRX file.
UPDATE 3: Well, it's happening again. I'm not sure how to proceed. I even tried Fiddler on the build box to see if I could catch funky looking traffic, but most of the traffic I'd expect to see I don't. It's like a good chunk of the work isn't being captured (such as source downloads, reporting progress, or test result publishing) by Fiddler. Is it not over HTTP/HTTPS?
This was difficult to figure out due to the quantity of tests we're running, but I was able to narrow it down to a test that launched ping.exe:
[ExpectedException(typeof(TimeoutException))]
[TestMethod]
public void ProcessWillTimeout()
{
const string command = "cmd";
const string args = "/C ping 127.0.0.1 /t";
var externalProcessService = new ExternalProcessService();
externalProcessService.Execute(command, args, TimeSpan.FromMilliseconds(500));
}
For whatever reason, this test was leaving both conhost.exe and ping.exe "orphaned". The fact these processes were not terminating was, for an unknown reason, preventing the tests from publishing their results back to TFS. There is probably something somewhere that waits for the a process to finish and that was never happening.
Indeed, we would see a bunch of conhost.exe and ping.exe processes in both Task Manager and Process Explorer:
You'll notice the tool tip there... "[Error opening process]". I couldn't even use Process Explorer to kill these processes - although Task Manager could. Sure enough, when I killed them, the TFS build task would immediately resume and finish publishing results.
So there is clearly some kind of bug in that ExternalProcessService code we were testing (despite carefully having a finally block that terminated the process), but we are at least able to have our build tests run again without issue.