Search code examples
pythonpeewee

Fastest and most memory efficient way to select() data with Peewee?


Peewee's documentation on iterating over lots of rows currently describes two available optimizations for doing so.

The first select.iterator() option seems to be a memory optimization.

The second option suggests calling select with dicts(),namedtuples(),objects() or tuples() is a speed optimization because it prevents Peewee from having to reconstruct model graphs.

My questions:

  • Are these methods mutually exclusive? e.g. it would seem that calling dicts() for example would return a huge array of dictionaries for a large result set, so wouldn't be memory efficient.
  • How do I combine both of these optimizations in order to iterate over a result set both quickly and with minimal memory?

Is that lonely little code sample down the bottom of the documentation my answer?

for stat in stats.objects().iterator():
    serializer.serialize_object(stat)

Solution

  • They are not mutually exclusive and can be combined. The example from the docs shows combining the "objects()" (which uses flat model instances and is fast if you have multiple models to reconstruct) and "iterator()" (which helps cut-down memory use).

    You could just as easily write:

    for stat in stats.tuples().iterator():
        # whatever
    

    If you just want to use the plain-old database cursor, you can write:

    stats = Stat.select().etc()
    cursor = database.execute(stats)
    for row_tuple in cursor:
        # do whatever