Search code examples
pythongarbage-collectiongeneratorreference-counting

Are generators with context managers an anti-pattern?


I'm wondering about code like this:

def all_lines(filename):
    with open(filename) as infile:
        yield from infile

The point of a context manager is to have explicit control over the lifetime of some form of state, e.g. a file handle. A generator, on the other hand, keeps its state until it is exhausted or deleted.

I do know that both cases work in practice. But I'm worried about whether it is a good idea. Consider for example this:

def all_first_lines(filenames):
    return [next(all_lines(filename), None) for filename in filenames]

I never exhaust the generators. Instead, their state is destroyed when the generator object is deleted. This works fine in reference-counted implementations like CPython, but what about garbage-collected implementations? I'm practically relying on the reference counter for managing state, something that context managers were explicitly designed to avoid!

And even in CPython it shouldn't be too hard to construct cases were a generator is part of a reference cycle and needs the garbage collector to be destroyed.

To summarize: Would you consider it prudent to avoid context managers in generators, for example by refactoring the above code into something like this?

def all_lines(filename):
    with open(filename) as infile:
        return infile.readlines()

def first_line(filename):
    with open(filename) as infile:
        return next(infile, None)

def all_first_lines(filenames):
    return [first_line(filename) for filename in filenames]

Solution

  • While it does indeed extend the lifetime of the object until the generator exits or is destroyed, it also can make the generators clearer to work with.

    Consider creating the generators under an outer with and passing the file as an argument instead of them opening it. Now the file is invalid for use after the context manager is exited, even though the generators can still be seen as usable.

    If limiting the time for how long the handles are held is important, you can explicitly close the generators using the close method after you are done with them.

    This is a similar problem to what trio tries to solve with its nurseries for asynchronous tasks, where the nursery context manager waits for every task spawned from that nursery to exit before proceeding, the tutorial example illustrates this. This blog post by the author can provide some reasoning for the way it's done in trio which can be an interesting read that's somewhat related to the problem.