Search code examples
pythonpython-2.7python-itertools

pythonic way: Iterator chaining and resource management


I find it difficult to articulate smoothly chained iterators and resource management in Python.

It will probably be clearer by examining a concrete example:

I have this little program that works on a bunch of similar, yet different csv files. As they are shared with other co-workers, I need to open and close them frequently. Moreover, I need to transform and filter their content. So I have a lot of different fonctions of this kind:

def doSomething(fpath):
    with open(fpath) as fh:
        r=csv.reader(fh, delimiter=';')
        s=imap(lambda row: fn(row), r)
        t=ifilter(lambda row: test(row), s)
        for row in t:
            doTheThing(row)

That's nice and readable, but, as I said, I have a lot of those and I end up copy-pasting a lot more than I'd wish. But of course I can't refactor the common code into a function returning an iterator:

def iteratorOver(fpath):
    with open(fpath) as fh:
        r=csv.reader(fh, delimiter=';')
        return r #oops! fh is closed by the time I use it

A first step to refactor the code would be to create another 'with-enabled' class:

def openCsv(fpath):
    class CsvManager(object):
        def __init__(self, fpath):
            self.fh=open(fpath)
        def __enter__(self):
            return csv.reader(self.fh, delimiter=';')
        def __exit__(self, type, value, traceback):
            self.fh.close()

and then:

with openCsv('a_path') as r:
    s=imap(lambda row: fn(row), r)
    t=ifilter(lambda row: test(row), s)
    for row in t:
        doTheThing(row)

But I only reduced the boilerplate of each function by one step.

So what is the pythonic way to refactor such a code? My c++ background is getting in the way I think.


Solution

  • You can use generators; these produce an iterable you can then pass to other objects. For example, a generator yielding all the rows in a CSV file:

    def iteratorOver(fpath):
        with open(fpath) as fh:
            r = csv.reader(fh, delimiter=';')
            for row in r:
                yield row
    

    Because a generator function pauses whenever you are not iterating over it, the function doesn't exit until the loop is complete and the with statement won't close the file.

    You can now use that generator in a filter:

    rows = iteratorOver('some path')
    filtered = ifilter(test, rows)
    

    etc.