Search code examples
pythonpostgresqlpygresql

Using Python/PyGreSQL, how can I efficiently handle a large result set?


I have a query result set of ~ 9-million rows.

I need to do some processing for each row, and the code currently does this:

query = conn.query(sql)
results = query.getresult()

for row in results: 
    # blah

I'm not sure, but I imagine that getresult() is pulling down the entire result set. Is that the case? I imagine there's a way to only pull chunks of the result set across the wire as needed, but I didn't immediately see something like that in the pg module docs.

Is it possible to do this with pgdb module instead, or some other approach?

My concerns are for memory on the application machine - I'd rather not load millions of rows into memory all at once if I can help it.

Is this even worth worrying about?


Solution

  • If it's following the Python Database API spec, you could use a cursor:

    curs = conn.cursor()
    curs.execute('select * from bigtable')
    

    then use curs.fetchone() or curs.fetchmany(chunksize)