python oracle-database odbc cx-oracle fetchall

cx_Oracle: fetchall() stops working with big SELECT statements

I'm trying to read data from an oracle db. I have to read on python the results of a simple select that returns a million of rows.

I use the fetchall() function, changing the arraysize property of the cursor.

select_qry = db_functions.read_sql_file('src/data/scripts/03_perimetro_select.sql')
dsn_tns = cx_Oracle.makedsn(ip, port, sid)
con = cx_Oracle.connect(user, pwd, dsn_tns)


start = time.time()

cur = con.cursor()
cur.arraysize = 1000
cur.execute('select * from bigtable where rownum < 10000')
res = cur.fetchall()
# print res  # uncomment to display the query results
elapsed = (time.time() - start)
print(elapsed, " seconds")
cur.close()
con.close()

If I remove the where condition where rownum < 10000 the python environment freezes and the fetchall() function never ends.

After some trials I found a limit for this precise select, it works till 50k lines, but it fails if I select 60k lines.

What is causing this problem? Do I have to find another way to fetch this amount of data or the problem is the ODBC connection? How can I test it?

Solution

Consider running in batches using Oracle's ROWNUM. To combine back into single object append to a growing list. Below assumes total row count for table is 1 mill. Adjust as needed:

table_row_count = 1000000
batch_size = 10000

# PREPARED STATEMENT
sql = """SELECT t.* FROM
            (SELECT *, ROWNUM AS row_num 
             FROM 
                (SELECT * FROM bigtable ORDER BY primary_id) sub_t
            ) AS t
         WHERE t.row_num BETWEEN :LOWER_BOUND AND :UPPER_BOUND;"""

data = []
for lower_bound in range(0, table_row_count, batch_size):
    # BIND PARAMS WITH BOUND LIMITS
    cursor.execute(sql, {'LOWER_BOUND': lower_bound, 
                         'UPPER_BOUND': lower_bound + batch_size - 1})

    for row in cur.fetchall():
       data.append(row)