I have a query result set of ~ 9-million rows.
I need to do some processing for each row, and the code currently does this:
query = conn.query(sql)
results = query.getresult()
for row in results:
# blah
I'm not sure, but I imagine that getresult()
is pulling down the entire result set. Is that the case? I imagine there's a way to only pull chunks of the result set across the wire as needed, but I didn't immediately see something like that in the pg
module docs.
Is it possible to do this with pgdb
module instead, or some other approach?
My concerns are for memory on the application machine - I'd rather not load millions of rows into memory all at once if I can help it.
Is this even worth worrying about?
If it's following the Python Database API spec, you could use a cursor:
curs = conn.cursor()
curs.execute('select * from bigtable')
then use curs.fetchone()
or curs.fetchmany(chunksize)