I am executing database queries which might result in thousands of rows.
The most simple way would to iterate over fetchall()
:
import pymysql
sql_command = "SELECT * FROM table WHERE some_condition;"
connection = pymysql.connect(...)
cursor = connection.cursor()
nof_affected_rows = cursor.execute(sql_command)
for row in cursor.fetchall():
# process row
However, I am afraid of memory exhaustion here. Thus, another naive approach would be to iterate over fetchone
, one element at a time.
import pymysql
sql_command = "SELECT * FROM table WHERE some_condition;"
connection = pymysql.connect(...)
cursor = connection.cursor()
nof_affected_rows = cursor.execute(sql_command)
while True:
row = cursor.fetchone()
if not row:
break
# process row
Now a colleague argued I should use fetchmany()
because it would retrieve multiple elements at once and thus would avoid roundtrips. At the same time I would stay in control of memory consumption.
Is that true? Is fetchone()
buffered or not. If it is buffered, to which extend?
PyMySQL eagerly downloads all elements when executing the query.
Unfortunately, I was unable to find definite statements in the documentation. Therefore I conducted a small experiment.
I started IPython and executed the following code snippet:
import pymysql
sql_command = "SELECT * FROM table WHERE some_condition;"
connection = pymysql.connect(...)
cursor = connection.cursor()
nof_affected_rows = cursor.execute(sql_command)
This resulted in approx. 450000 rows to load.
In a separate terminal, I monitored the memory consumption of IPython using htop
. The memory consumption went up from 0.1% to 7.3% which corresponds to approx. 2.2GiB of memory.
After I disconnected from the network, I was still able to retrieve all elements from the Cursor.
Therefore I conclude that all rows are retrieved at cursor.execute()
. Thus, in PyMySQL the question of whether to use fetchone
, fetchmany
or fetchall
boils down to whatever is more convenient.