Search code examples
pythonsqlitecursor

Python, SQLite3: cursor returns duplicates when a commit intervenes


This Python code creates a table, inserts three rows into it and iterates through the rows, with intervening commits before the cursor has been fully exhausted. Why does it return five rows instead of three? If the intervening commit is removed, the number of returned rows is three as expected. Or is it expected that a commit (which doesn't even touch the table in question) invalidates a cursor?

Edit: Added a forgotten commit (which makes the issue disappear) and an insert to an unrelated table (which makes the issue appear again).

#!/usr/bin/env python3

import sqlite3 as sq

db = sq.connect(':memory:')

db.execute('CREATE TABLE tbl (col INTEGER)')
db.execute('CREATE TABLE tbl2 (col INTEGER)')
db.executemany('INSERT INTO tbl (col) VALUES (?)', [(0,), (1,), (2,)])
db.commit()

print('count=' + str(db.execute('SELECT count(*) FROM tbl').fetchone()[0]))

# Read and print the values just inserted into tbl
for col in db.execute('SELECT col FROM tbl'):
    print(col)
    db.execute('INSERT INTO tbl2 VALUES (?)', col)
    db.commit()

print('count=' + str(db.execute('SELECT count(*) FROM tbl').fetchone()[0]))

The output is:

count=3
(0,)
(1,)
(0,)
(1,)
(2,)
count=3

Generally, with N rows inserted, N+2 rows are returned by the iterator, apparently always with the first two duplicated.


Solution

  • Your followup comment disturbed me (particularly because it was clear you were right). So I spent some time studying the source code to the python _sqlite.c library (https://svn.python.org/projects/python/trunk/Modules/_sqlite/).

    I think the problem is how the sqlite Connection object is handling cursors. Internally, Connection objects maintain a list of cursors AND prepared statements. The nested db.execute('INSERT ...') call resets the list of prepared statements associated to the Connection object.

    The solution is to not rely on the shortcut execute() method's automatic cursor management, and to explicitly hold a reference to the running Cursor. Cursors maintain their own prepared statement lists which are separate from Connection objects.

    You can either explicitly create a cursor OR invoke fetchall() on the db.execute() call. Example of the later:

    import sqlite3 as sq
    
    db = sq.connect(':memory:')
    
    db.execute('CREATE TABLE tbl (col INTEGER)')
    db.execute('CREATE TABLE tbl2 (col INTEGER)')
    db.executemany('INSERT INTO tbl (col) VALUES (?)', [(0,), (1,), (2,)])
    db.commit()
    
    print('count=' + str(db.execute('SELECT count(*) FROM tbl').fetchone()[0]))
    
    # Read and print the values just inserted into tbl
    for col in db.execute('SELECT col FROM tbl').fetchall():
        print(col)
        db.execute('INSERT INTO tbl2 VALUES (?)', col)
        db.commit()
    
    print('count=' + str(db.execute('SELECT count(*) FROM tbl').fetchone()[0]))
    

    The output is as expected:

    count=3
    (0,)
    (1,)
    (2,)
    count=3
    

    If the fetchall() approach is memory prohibitive, then you may need to fall back to relying on isolation between two database connections (https://www.sqlite.org/isolation.html). Example:

    db1 = sq.connect('temp.db')
    
    db1.execute('CREATE TABLE tbl (col INTEGER)')
    db1.execute('CREATE TABLE tbl2 (col INTEGER)')
    db1.executemany('INSERT INTO tbl (col) VALUES (?)', [(0,), (1,), (2,)])
    db1.commit()
    
    print('count=' + str(db1.execute('SELECT count(*) FROM tbl').fetchone()[0]))
    
    db2 = sq.connect('temp.db')
    
    # Read and print the values just inserted into tbl
    for col in db1.execute('SELECT col FROM tbl').fetchall():
        print(col)
        db2.execute('INSERT INTO tbl2 VALUES (?)', col)
        db2.commit()
    
    print('count=' + str(db1.execute('SELECT count(*) FROM tbl').fetchone()[0]))