So i'm trying to do a massive query from a postgresql database that I have locally.
The 'processrecords' function, returns a listofmalwareobjects, i'm assuming i'll need to add a new list to a master list every time the server side cursor is run.
How would I do this properly as i'm a little confused.
I think I need to use server side cursors as otherwise the program runs out of memory with Psycopg2. But i've heard good things about server side cursors.
with connection:
cursor = connection.cursor()
with cursor:
cursor.itersize = 20000
cursor.execute("SELECT malware_id, malwarehashmd5, malwarehashsha1, malwarehashsha256g FROM malwarehashesandstrings")
listoffetchedmalware = cursor.fetchall()
listofmalwareobjects = processrecords(listoffetchedmalware)
For the documentation:
Psycopg wraps the database server side cursor in named cursors. A named cursor is created using the cursor() method specifying the name parameter.
Try:
with connection:
cursor = connection.cursor('my_cursor')
...
However, fetchall()
will still return all rows at once. If you want to process the data in buckets, use fetchmany()
in a loop, e.g.
with connection.cursor(name="my_cursor") as cursor:
cursor.itersize = 20000
cursor.execute(the_query)
listoffetchedmalware = cursor.fetchmany(cursor.itersize)
while len(listoffetchedmalware) > 0:
listofmalwareobjects = processrecords(listoffetchedmalware)
listoffetchedmalware = cursor.fetchmany(cursor.itersize)