Search code examples
python-3.xdatabasepostgresqlpsycopg2server-side

How does psycopg2 server side cursor operate when itersize is less than data size and fetch number is less than itersize?


I have read the documentation and several articles and posts and threads and all but I am not sure if I understand this clearly. Lets suppose this scenario:

1. I have a server side cursor.
2. I set the itersize to 1000.
3. I execute a SELECT query which would normally return 10000 records.
4. I use fetchmany to fetch 100 records at a time.

My question is how is this done behind the scene? My understanding is that the query is executed, but 1000 of the records are read by the server side cursor. The cursor refrains from reading the next 1000 unless it scrolls past the last record of the currently read 1000. Furthermore, the server side cursor holds the 1000 in the server's memory and scrolls over them 100 at a time, sending them to the client. I'm also curious to know what would the ram consumption look like? By my understanding, if executing the full query takes 10000 kb of the memory, the server side cursor would consume only 1000 kb on the server because it reads only 1000 records at a time and the client side cursor would use 100 kb. Is my understanding correct?

UPDATE Per the documents and the discussion we had in the responses, I would expect this code to print a list of 10 items at a time:

from psycopg2 import connect, extras as psg_extras
    
with connect(host="db_url", port="db_port", dbname="db_name", user="db_user", password="db_password") as db_connection:
     with db_connection.cursor(name="data_operator", 
         cursor_factory=psg_extras.DictCursor) as db_cursor:
         db_cursor.itersize = 10
         db_cursor.execute("SELECT rec_pos FROM schm.test_data;")
         for i in db_cursor:
             print(i)
             print(">>>>>>>>>>>>>>>>>>>")

However, in each iteration it prints just one record. The only way I get 10 records is if I use fetchmany:

from psycopg2 import connect, extras as psg_extras
    
with connect(host="db_url", port="db_port", dbname="db_name", user="db_user", password="db_password") as db_connection:
     with db_connection.cursor(name="data_operator", 
        cursor_factory=psg_extras.DictCursor) as db_cursor:
        db_cursor.execute("SELECT rec_pos FROM schm.test_data;")
         records = db_cursor.fetchmany(10)
         while len(records) > 0:
             print(i)
             print(">>>>>>>>>>>>>>>>>>>")
             records = db_cursor.fetchmany(10)

Based on these two code snippets, what I'm guessing is happening in the scenario mentioned before is that given the code bellow...

from psycopg2 import connect, extras as psg_extras
    
with connect(host="db_url", port="db_port", dbname="db_name", user="db_user", password="db_password") as db_connection:
    with db_connection.cursor(name="data_operator", 
        cursor_factory=psg_extras.DictCursor) as db_cursor:
        db_cursor.itersize = 1000
        db_cursor.execute("SELECT rec_pos FROM schm.test_data;")
        records = db_cursor.fetchmany(100)
        while len(records) > 0:
            print(i)
            print(">>>>>>>>>>>>>>>>>>>")
            records = db_cursor.fetchmany(100)

... itersize is a server side thing. What it does is that when the query runs, it sets a limit to load only 1000 records from the database. But fetchmany is a client side thing. It gets 100 of the 1000 from the server. Each time fetchmany runs, the next 100 is fetched from the server. When all the 1000 on the server side are scrolled over, the next 1000 are fetched from the DB on the server side. But I'm rather confused because that does not seem to be what the docs imply. But then again... the code seems to imply that.


Solution

  • I would spend some time here Server side cursor.

    What you will find is that itersize only applies when you are iterating over a cursor:

    for record in cur:
         print record
    

    Since you are using fetchmany(size=100) you will only be working with 100 rows at a time. The server will not be holding 1000 rows in memory. I was wrong sort of. The cursor will return all the rows to the client in memory and then fetchmany() will pull the rows from there in the batch size specified if a named cursor is not used. If a named cursor is used then it will fetch from server in the batch size.

    UPDATE. Show how itersize and fetchmany() work.

    Using itersize and fetchmany() with named cursor:

    cur = con.cursor(name='cp')
    cur.itersize = 10
    cur.execute("select * from cell_per")
    for rs in cur:     
       print(rs) 
    cur.close()
    
    #Log
    statement: DECLARE "cp" CURSOR WITHOUT HOLD FOR select * from cell_per
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: CLOSE "cp"
    
    cur = con.cursor(name='cp') 
    cur.execute("select * from cell_per")
    cur.fetchmany(size=10) 
    
    #Log
    statement: DECLARE "cp" CURSOR WITHOUT HOLD FOR select * from cell_per
    statement: FETCH FORWARD 10 FROM "cp"
    
    

    Using fetchmany with unnamed cursor:

    cur = con.cursor()
    cur.execute("select * from cell_per")
    rs = cur.fetchmany(size=10)
    len(rs)                                                                                                                                                                   
    10
    
    #Log
    statement: select * from cell_per
    

    So the named cursor fetches the rows(from server) in batches set by itersize when iterated over or by size when using fetchmany(size=n). Whereas a non-named cursor pulls all the rows into memory and then fetches them from there according to size set in fetchmany(size=n).

    Further Update.

    itersize only applies when you are iterating over the cursor object itself:

    cur = con.cursor(name="cp")
    cur.itersize = 10 
    cur.execute("select * from cell_per")
    for r in cur: 
        print(r) 
    cur.close()
    
    #Postgres log:
    statement: DECLARE "cp" CURSOR WITHOUT HOLD FOR select * from cell_per
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: FETCH FORWARD 10 FROM "cp"
    statement: CLOSE "cp"
    

    In above the r will be a single row that is fetched from each batch of 10 rows that the server side(named) cursor returns. That batch size is = itersize. So when you are iterating over the named cursor object itself all the rows the query specifies will be returned in the iterator, just in batches of itersize.

    Not iterating over named cursor object. Using fetchmany(size=n):

    cur = con.cursor(name="cp") 
    cur.itersize = 10
    cur.execute("select * from cell_per") 
    cur.fetchmany(size=20)
    cur.fetchmany(size=20)
    cur.close()
    
    #Postgres log:
    statement: DECLARE "cp" CURSOR WITHOUT HOLD FOR select * from cell_per
    statement: FETCH FORWARD 20 FROM "cp"
    statement: FETCH FORWARD 20 FROM "cp"
    CLOSE "cp"
    

    The itersize was set but it has no effect as the named cursor object is not being iterated over. Instead the fetchmany(size=20) is having the server side cursor send a batch of 20 records each time it is called.