Search code examples
python-2.7sql-server-2005rabbitmqpyodbcpython-pika

Select nth to nth row while table still have values unselected with python and pyodbc


I have a table with 10,000 rows and I want to select the first 1000 rows and then select again and this time, the next set of rows, which is 1001-2001.

I am using the BETWEEN clause in order to select the range of values. I can also increment the values. Here is my code:

count = cursor.execute("select count(*) from casa4").fetchone()[0]    
ctr = 1
ctr1 = 1000
str1 = ''
while ctr1 <= count:
    sql = "SELECT AccountNo FROM ( \
        SELECT AccountNo, ROW_NUMBER() OVER (ORDER BY Accountno) rownum \
        FROM  casa4 ) seq \
        WHERE seq.rownum BETWEEN " + str(ctr) + " AND " + str(ctr1) + ""
    ctr = ctr1 + 1
    ctr1 = ctr1 + 1000
    cursor.execute(sql)
    sleep(2) #interval in printing of the rows.

for row in cursor:
    str1 = str1 + '|'.join(map(str,row)) + '\n'
print "Records:" + str1 #var in storing the fetched rows from database.
print sql #prints the sql statement(str) and I can see that the var, ctr and ctr1 have incremented correctly. The way I want it.

What I want to achieve is using a messaging queue, RabbitMQ, I will send this rows to another database and I want to speed up the process. Selecting all and sending it to the queue returns an error.

The output of the code is that it returns 1-1000 rows correctly on the 1st but, on the 2nd loop, instead of 1001-2001 rows, it returns 1-2001 rows, 1-3001 and so on.. It always starts on 1.


Solution

  • I was able to recreate your issue with both pyodbc and pypyodbc. I also tried using

    WITH seq (AccountNo, rownum) AS
    (
        SELECT AccountNo, ROW_NUMBER() OVER (ORDER BY Accountno) rownum
        FROM casa4
    )
    SELECT AccountNo FROM seq
    WHERE rownum BETWEEN 11 AND 20
    

    When I run that in SSMS I just get rows 11 through 20, but when I run it from Python I get all the rows (starting from 1).

    The following code does work using pyodbc. It uses a temporary table named #numbered, and might be helpful in your situation since your process looks like it would do all of its work using the same database connection:

    import pyodbc
    cnxn = pyodbc.connect("DSN=myDb_SQLEXPRESS")
    crsr = cnxn.cursor()
    sql = """\
    CREATE TABLE #numbered (rownum INT PRIMARY KEY, AccountNo VARCHAR(10))
    """
    crsr.execute(sql)
    cnxn.commit()
    sql = """\
    INSERT INTO #numbered (rownum, AccountNo)
    SELECT
        ROW_NUMBER() OVER (ORDER BY Accountno) AS rownum,
        AccountNo
    FROM casa4
    """
    crsr.execute(sql)
    cnxn.commit()
    sql = "SELECT AccountNo FROM #numbered WHERE rownum BETWEEN ? AND ? ORDER BY rownum"
    batchsize = 1000
    ctr = 1
    while True:
        crsr.execute(sql, [ctr, ctr + batchsize - 1])
        rows = crsr.fetchall()
        if len(rows) == 0:
            break
        print("-----")
        for row in rows:
            print(row)
        ctr += batchsize
    cnxn.close()