Search code examples
pandassqlalchemy

How to specify auto commit to false explicitly in pandas


In a small program, I'm using the Pandas to_sql method to push data to the database. But in my scenario, I need to set auto-commit to false, but Pandas commits nonetheless. Is there a workaround for this?

# Example code
df = pd.read_csv(file)

# In here i need to turn auto commit off
with conn.connect() as c:
    df.to_sql('table_name', c)

    # Some critical workload
    # updating another table
    # Then commit after updating the table
    c.commit()

Solution

  • With SQLAlchemy 2.0, with engine.connect() as c: and with engine.begin() as c: normally differ in that the former will not automatically commit when the context manager exits, while the latter will automatically commit (unless an error occurs). So,

    engine.echo = True
    with engine.connect() as c:
        c.exec_driver_sql("INSERT INTO table_name (id) VALUES (1)")
    print(">>> Done.")
    

    results in a rollback

    … INSERT INTO table_name (id) VALUES (1)
    … [raw sql] ()
    … ROLLBACK
    >>> Done.
    

    However, to_sql() does automatically commit

    df = pd.DataFrame([(1,), (2,)], columns=["id"])
    engine.echo = True
    with engine.connect() as c:
        df.to_sql("table_name", c, if_exists="append", index=False)
    print(">>> Done.")
    
    … INSERT INTO table_name (id) VALUES (?)
    … [generated in 0.00087s] [(1,), (2,)]
    … COMMIT
    >>> Done.
    

    and the commit happens immediately after the .to_sql() completes, as shown by adding a breakpoint within the context manager

    df = pd.DataFrame([(1,), (2,)], columns=["id"])
    engine.echo = True
    with engine.connect() as c:
        df.to_sql("table_name", c, if_exists="append", index=False)
        breakpoint()
        print("Exiting context manager.")
    print(">>> Done.")
    
    2023-06-28 07:09:13,887 INFO sqlalchemy.engine.Engine INSERT INTO table_name (id) VALUES (?)
    2023-06-28 07:09:13,887 INFO sqlalchemy.engine.Engine [generated in 0.00066s] [(1,), (2,)]
    2023-06-28 07:09:13,912 INFO sqlalchemy.engine.Engine COMMIT
    > /home/gord/PycharmProjects/sqlalchemy_demo/so76566226.py(14)<module>()
    -> print("Exiting context manager.")
    (Pdb)
    

    But, .to_sql() only seems to immediately automatically commit if a transaction is not active when it is invoked, so if we begin the transaction ourselves then the .to_sql() won't be committed until we commit the transaction ourselves

    df = pd.DataFrame([(1,), (2,)], columns=["id"])
    engine.echo = True
    with engine.connect() as c, c.begin():
        df.to_sql("table_name", c, if_exists="append", index=False)
        breakpoint()
        c.commit()
    print(">>> Done.")
    
    2023-06-28 07:15:57,699 INFO sqlalchemy.engine.Engine INSERT INTO table_name (id) VALUES (?)
    2023-06-28 07:15:57,699 INFO sqlalchemy.engine.Engine [generated in 0.00052s] [(1,), (2,)]
    > /home/gord/PycharmProjects/sqlalchemy_demo/so76566226.py(14)<module>()
    -> c.commit()
    (Pdb) next
    2023-06-28 07:16:13,131 INFO sqlalchemy.engine.Engine COMMIT
    > /home/gord/PycharmProjects/sqlalchemy_demo/so76566226.py(15)<module>()
    -> print(">>> Done.")
    (Pdb)