Hunt for root cause of flaky test in Python

There is a flaky test, and we have no clue what the root cause could be.

with pytest.raises(foo.geocoder.GeocodingFailed):
    foo.geocoder.geocode('not a valid place')

Sometimes the exception does not happen.

I looked at the docs how to handle failures, but this did not help.

How to trace geocode() so that if the exception does not happen, I see a trace?

I looked at the trace module of the standard library. But there seems to be no easy way to get the trace as string.

With "a trace" I mean: A trace of all lines which got executed during gecode(). I would like to see method calls and returns statements with indentation. I want to ignore lines from the Python standard library.

AFAIK a debugger like pdb does not help here, since the test only fails when it gets run in CI, and only once or twice per month.

Solution

The trace module gives programmatic access via trace.Trace class.
On test failures the Trace class' console output is visible.
And it has coverage report to write on chosen path.

I exclude sys.base_exec_prefix, sys.base_prefix to not trace Python library modules.

import pytest


def f(x):
    import random

    choice = random.choice((True, False))

    if choice:
        raise ValueError
    else:
        return x * 2


def trace(f, *args, **kwargs):
    import trace
    import sys

    tracer = trace.Trace(
        ignoredirs=(sys.base_exec_prefix, sys.base_prefix), timing=True,
    )
    ret = tracer.runfunc(f, *args, **kwargs)
    r = tracer.results()
    r.write_results(coverdir="/tmp/xx-trace")
    return ret


def test_f():
    with pytest.raises(ValueError):
        trace(f, 3)

Coverage report;
Lines marked with >>>>>> were not executed i.e. not traced and the numbers with colon are execution counts.

>>>>>> import pytest
       
       
>>>>>> def f(x):
    1:     import random
       
    1:     choice = random.choice((True, False))
       
    1:     if choice:
>>>>>>         raise ValueError
           else:
    1:         return x * 2
       
       
>>>>>> def trace(f, *args, **kwargs):
>>>>>>     import trace
>>>>>>     import sys
       
>>>>>>     tracer = trace.Trace(
>>>>>>         ignoredirs=(sys.base_exec_prefix, sys.base_prefix), timing=True,
           )
>>>>>>     tracer.runfunc(f, *args, **kwargs)
>>>>>>     r = tracer.results()
>>>>>>     r.write_results(coverdir="/tmp/xx-trace")
       
       
>>>>>> def test_f():
>>>>>>     with pytest.raises(ValueError):
>>>>>>         trace(f, 3)