Search code examples
pythonsql-serverpandassqlalchemyupsert

How to upsert pandas DataFrame to Microsoft SQL Server table?


I would like to upsert my pandas DataFrame into a SQL Server table. This question has a workable solution for PostgreSQL, but T-SQL does not have an ON CONFLICT variant of INSERT. How can I accomplish the same thing for SQL Server?


Solution

  • Update, July 2022: You can save some typing by using this function to build the MERGE statement and perform the upsert for you.


    SQL Server offers the MERGE statement:

    import pandas as pd
    import sqlalchemy as sa
    
    connection_string = (
        "Driver=ODBC Driver 17 for SQL Server;"
        "Server=192.168.0.199;"
        "UID=scott;PWD=tiger^5HHH;"
        "Database=test;"
        "UseFMTONLY=Yes;"
    )
    connection_url = sa.engine.URL.create(
        "mssql+pyodbc",
        query={"odbc_connect": connection_string}
    )
    
    engine = sa.create_engine(connection_url, fast_executemany=True)
    
    with engine.begin() as conn:
        # step 0.0 - create test environment
        conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
        conn.exec_driver_sql(
            "CREATE TABLE main_table (id int primary key, txt varchar(50))"
        )
        conn.exec_driver_sql(
            "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
        )
        # step 0.1 - create DataFrame to UPSERT
        df = pd.DataFrame(
            [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
        )
    
        # step 1 - upload DataFrame to temporary table
        df.to_sql("#temp_table", conn, index=False, if_exists="replace")
    
        # step 2 - merge temp_table into main_table
        conn.exec_driver_sql(
            """\
            MERGE main_table WITH (HOLDLOCK) AS main
            USING (SELECT id, txt FROM #temp_table) AS temp
            ON (main.id = temp.id)
            WHEN MATCHED THEN
                UPDATE SET txt = temp.txt
            WHEN NOT MATCHED THEN
                INSERT (id, txt) VALUES (temp.id, temp.txt);
            """
        )
    
        # step 3 - confirm results
        result = conn.exec_driver_sql(
            "SELECT * FROM main_table ORDER BY id"
        ).fetchall()
        print(result)  
        # [(1, 'row 1 new text'), (2, 'new row 2 text')]