Search code examples
sql-serverpython-3.xpandassqlalchemypandas-to-sql

pandas.DataFrame.to_sql inserts data, but doesn't commit the transaction


I have a pandas dataframe I'm trying to insert into MS SQL EXPRESS as per below:

import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine("mssql+pyodbc://user:password@testodbc")
connection = engine.connect()

data = {'Host': ['HOST1','HOST2','HOST3','HOST4'],
    'Product': ['Apache HTTP 2.2','RedHat 6.9','OpenShift 2','JRE 1.3'],
    'ITBS': ['Infrastructure','Accounting','Operations','Accounting'],
    'Remediation': ['Upgrade','No plan','Decommission','Decommission'],
    'TargetDate': ['2018-12-31','NULL','2019-03-31','2019-06-30']}

df = pd.DataFrame(data)

When I call:

df.to_sql(name='TLMPlans', con=connection, index=False, if_exists='replace')

and then:

print(engine.execute("SELECT * FROM TLMPLans").fetchall())

I can see the data alright, but it actually doesn't commit any transaction:

D:\APPS\Python\python.exe 
C:/APPS/DashProjects/dbConnectors/venv/Scripts/readDataFromExcel.py
[('HOST1', 'Apache HTTP 2.2', 'Infrastructure', 'Upgrade', '2018-12-31'), ('HOST2', 'RedHat 6.9', 'Accounting', 'No plan', 'NULL'), ('HOST3', 'OpenShift 2', 'Operations', 'Decommission', '2019-03-31'), ('HOST4', 'JRE 1.3', 'Accounting', 'Decommission', '2019-06-30')]

Process finished with exit code 0

enter image description here

It says here I don't have to commit as SQLAlchemy does it:

Does the Pandas DataFrame.to_sql() function require a subsequent commit()?

and the below suggestions don't work:

Pandas to_sql doesn't insert any data in my table

I spent good 3 hours looking for clues all over the Internet, but I'm not getting any relevant answers, or I don't know how to ask the question.

Any guidance on what to look for would be highly appreciated.

UPDATE

I'm able to commit changes using pyodbc connection and full insert statement, however pandas.DataFrame.to_sql() with SQLAlchemy engine doesn't work. It send the data to memory instead the actual database, regardless if schema is specified or not.

I would really appreciate help with this on, or possibly it is a panda issue I need to report?


Solution

  • I had the same issue, I realised you need to tell pyodbc which database you want to use. For me the default was master, so my data ended up there.

    There are two ways you can do this, either:

    connection.execute("USE <dbname>")
    

    Or define the schema in the df.to_sql():

    df.to_sql(name=<TABELENAME>, conn=connection, schema='<dbname>.dbo')
    

    In my case the schema was <dbname>.dbo I think the .dbo is default so it could be something else if you define an alternative schema

    This was referenced in this answer, it took me a bit longer to realise what the schema name should be.