Search code examples
pythonloopspeewee

Better way to control flow around database query


In my program I need to read a very large table (it exceeds memory storage) and have myself writing the following construct to read from the table and do some work. While I know it's very possible to re-write the select into an iterator style it still has the basic structure that is follows:

    found = True
    start = 0
    batch = 2500
    while found:
        found = False
        for obj in Channel.select().limit(batch).offset(start):
            found = True

            # do some big work...

        start += batch

What I would like to do is have something that don't carry around as many klunky state variables. Ideas of how to clean up this bit of mess?

FYI - I've tried this as well, not sure I like it any better:

@staticmethod
def qiter(q, start=0, batch=25000):
    obj = True
    while obj:
        for obj in q.limit(batch).offset(start):
            yield obj
        start += batch

Solution

  • The shortest thing I found is the following:

    for start in itertools.count(0, 2500):
        objs = Channel.select().limit(2500).offset(start)
        if not objs:
            break
        for obj in objs:
            # do some big work...
    

    Basically it's a combination of two things:

    • the count iterator (from the itertools package of the standard library) reduces the batch counting to a minimum, and
    • using a separate test and the break statement to get out of it.

    In detail:

    The count iterator is pretty simple: it yield the infinite series 0, 2500, 5000, 7500, ... . As this loop would never end, we need to break out of it somewhere. This is where the if-statement comes into play. If objs is an empty list, the break exists the outer loop.