I am processing a huge postgresql database for which I have created a "fetch" function.
def fetch(cursor, batch_size=1e3):
"""An iterator that uses fetchmany to keep memory usage down"""
while True:
records = cursor.fetchmany(int(batch_size))
if not records:
break
for record in records:
yield record
For each item I am doing some processing, but right now I have a problem where the last item in some cases will be omitted as I am doing some comparison between the items. And as soon as that comparison doesn't yield on the last item nothing will be done.
connection = psycopg2.connect(<url>)
cursor = connection.cursor()
cursor.execute(<some query>)
temp_today = 0
for row in fetch(cursor):
item = extract_variables(row)
date = item['datetime']
today = date.date()
if temp_today is 0:
# do something with first row
temp_today = date
# -----------------------------------------
# I feel like I am missing a statement here
# something like:
# if row == rows[-1]:
# do something with last row..
# -----------------------------------------
elif temp_today.date() == today:
# do something with every row where
# the date is the same
else:
# do something with every row where
# the dates ain't the same
How do I do something with the last item when I am using a yield?
It is extremely important for me to use yield as I am handling a VERY huge dataset and I will run out of memory if I don't.
Thanks to @Peter Smit from the comments I used the following solution:
connection = psycopg2.connect(<url>)
cursor = connection.cursor()
cursor.execute(<some query>)
temp_today = 0
parsed_count = 0
cursor_count = cursor.rowcount
for row in fetch(cursor):
item = extract_variables(row)
date = item['datetime']
today = date.date()
if temp_today is 0:
# do something with first row
temp_today = date
elif parsed_count == cursor_count:
# do something with the last row
elif temp_today.date() == today:
# do something with every row where
# the date is the same
else:
# do something with every row where
# the dates ain't the same