Search code examples
pythonmysqlmysql-connector-python

Python does cursor execute load all data


I am trying to query a large data (10 million rows) and try to prevent out of memory, but not familiar with Python and confused with different opinions regarding the execute(), cursor iterator and fetchone()

Am I right to assume that cursor.execute() does not load all data into memory and only when I call fetchone() then it will load 1 row of data

from mysql.connector import MySQLConnection


def query():
    conn = MySQLConnection(host=conf['host'],
                                conf['port'],
                                conf['user'],
                                conf['password'],
                                conf['database'])
    cursor = conn.cursor(buffered=True)
    cursor.execute('SELECT * FROM TABLE') # 10 million rows

does this cursor iterator does the same with fetchone() ?

for row in cursor:
    print(row)

is my code snippet is safe to handle 10 million rows of data? if not, how can I safely iterate the data without out of memory?


Solution

  • Taken from MySQL documentation:

    The fetchone() method is used by fetchall() and fetchmany(). It is also used when a cursor is used as an iterator.

    The following example shows two equivalent ways to process a query result. The first uses fetchone() in a while loop, the second uses the cursor as an iterator:

    # Using a while loop
    cursor.execute("SELECT * FROM employees")
    row = cursor.fetchone()
    while row is not None:
      print(row)
      row = cursor.fetchone()
    
    # Using the cursor as iterator
    cursor.execute("SELECT * FROM employees")
    for row in cursor:
      print(row)
    

    It also stated that:

    You must fetch all rows for the current query before executing new statements using the same connection.

    If you are worried about performance issues you should use fetchmany(n) in a while loop until you fetch all of the results like so:

    'An iterator that uses fetchmany to keep memory usage down'
        while True:
            results = cursor.fetchmany(arraysize)
            if not results:
                break
            for result in results:
                yield result
    

    This behavior adheres to PEP249, which describes how and which methods db connectors should implement. A partial answer is given in this thread.

    Basically the implementation of fetchall vs fetchmany vs fetchone would be up to the developers of the library depending on the database capabilities, but it would make sense, in the case of fetchmany and fetchone, that the unfetched/remaining results would be kept server side, until requested by another call or destruction of cursor object.

    So in conclusion I think it is safe to assume calling execute method does not, in this case(mysqldb), dump all the data from the query to memory.