Flaky tests passing locally and on TeamCity only when remote desktop window is open

I have some UI tests (written using FlaUI) that will pass locally but will fail on TeamCity. When I tried watching the tests run on the remote build agent through remote desktop to see the error, I realized that the tests will pass if the remote desktop app is open and not minimized. This is only a small subset of my tests that fail as most of them pass in any condition. What could be causing this?

Solution

Typically, when a test passes locally, but fails on CI, there are two potential causes:

There might be a test leaking state into the global environment that sometimes runs before the failing test, causing its assumptions about the global environment to be incorrect, leading to the failure. Or...
You could be running into a parallel execution race condition between two tests that both access the same shared global state (like a cache or a key/value store).

I give a more detailed answer here.

In your specific case, my guess is that the Remote Desktop window thing is a coincidence, or maybe something about that forces the tests to execute in a specific order which has the leaky test(s) running after the test that would otherwise fail.

To debug the situation you want to figure out what order causes the flaky test to fail. Then you want to start removing tests from the test run until you find the one that causes the failure. Some tests tools (like RSpec) have a feature that allows you to bisect your test suite automatically. Leverage that if possible. Otherwise you'll have to do the binary search yourself.

To do that yourself, check your build logs where the test fails to see what tests ran before it. Then run them see if you can reproduce it locally by running the tests in that same order. If you cannot, then you likely have the second problem I described above. But, if you can, then you just need to run the first half of the tests before the failure, and keep cutting the tests you run in half every time the test continues to fail. If you run half the tests and the test doesn't fail, then run the other half of the tests. (Hopefully the leak the results in the test failure is caused by a single test. If not debugging this manually gets really hard.