I am trying to query a large data (10 million rows) and try to prevent out of memory, but not familiar with Python and confused with different opinions regarding the execute(), cursor iterator and fetchone()
Am I right to assume that cursor.execute() does not load all data into memory and only when I call fetchone() then it will load 1 row of data
from mysql.connector import MySQLConnection
def query():
conn = MySQLConnection(host=conf['host'],
conf['port'],
conf['user'],
conf['password'],
conf['database'])
cursor = conn.cursor(buffered=True)
cursor.execute('SELECT * FROM TABLE') # 10 million rows
does this cursor iterator does the same with fetchone() ?
for row in cursor:
print(row)
is my code snippet is safe to handle 10 million rows of data? if not, how can I safely iterate the data without out of memory?
Taken from MySQL documentation:
The
fetchone()
method is used byfetchall()
andfetchmany()
. It is also used when a cursor is used as an iterator.
The following example shows two equivalent ways to process a query result. The first uses fetchone()
in a while loop, the second uses the cursor as an iterator:
# Using a while loop
cursor.execute("SELECT * FROM employees")
row = cursor.fetchone()
while row is not None:
print(row)
row = cursor.fetchone()
# Using the cursor as iterator
cursor.execute("SELECT * FROM employees")
for row in cursor:
print(row)
It also stated that:
You must fetch all rows for the current query before executing new statements using the same connection.
If you are worried about performance issues you should use fetchmany(n)
in a while loop until you fetch all of the results like so:
'An iterator that uses fetchmany to keep memory usage down'
while True:
results = cursor.fetchmany(arraysize)
if not results:
break
for result in results:
yield result
This behavior adheres to PEP249, which describes how and which methods db connectors should implement. A partial answer is given in this thread.
Basically the implementation of fetchall
vs fetchmany
vs fetchone
would be up to the developers of the library depending on the database capabilities, but it would make sense, in the case of fetchmany
and fetchone
, that the unfetched/remaining results would be kept server side, until requested by another call or destruction of cursor object.
So in conclusion I think it is safe to assume calling execute method does not, in this case(mysqldb), dump all the data from the query to memory.