Search code examples
python-3.xsql-serverpandasdataframesqlalchemy

Upsert/Append to SQL database using SQL Alchemy/Pandas


I'm using Python 3.11 and SSMS 19.

Once again, I'm running into a ton of fun with my project. I've gotten it working reasonably well, but I realize now that it isn't updating, just adding the same lines over and over into my database. Meaning it rejects all of the entries if a primary key matches, and if I remove the PK constraint, it doesn't update, it just adds a new line.

Edit: Just for clarification, I want to do this by column, not by row or entry. If I have to add each update individually, I might as well not even write the program, as I would be writing update lines for ~300 rows every 3 days. I need the code to go line by line by itself, and either update, merge, or ignore depending on the column.

Here's what's working right now, I have the engine/connection created through SQLAlchemy.

fulldf.to_sql('Shortage',engine, if_exists='append', index=False, chunksize=200, dtype={'Priority': BigInteger,
                                                               'PN':String(255),
                                                               'Part_Type': Text,
                                                               'MPN': Text,
                                                               'Proposed_MPN': Text,
                                                               'Distributor': String(100),
                                                               'Reason_for_Add': Text,
                                                               'Line_Down': Date,
                                                               'Engineering-Samples_Available': Text,
                                                               'Build_Samples-Available': Text,
                                                               'Comments': Text,                                                                                       
                                                               'Supplier_Comments': Text,
                                                               'Circuit_Location': Text,
                                                               'Last_Updated': Date 

This works awesome for the initial table, and if I didn't have any Primary Key repeats. Unfortunately I do and I will, so it's not great for that.

My primary key (constrained in SQL, not here for ease of use) is 'PN', of which all are unique. Some columns like 'Part_Type','MPN','Distributor', etc I just want to be overwritten/ignored. Those values shouldn't change and if they do, that will be a case-by-case basis.

For the Date columns, I want the old date removed and the other added in, which I believe can be accomplished with if_exists='update' or perhaps if_exists='replace'? Either way, that wont be too much of a hurdle.

As an afterthought- I really don't care about SQL injection here. This program is going to be used by roughly 5 trusted people on the local intranet, and if someone really wants to blow up a simple (unnecessary for production) database, they're more than welcome to. I'd rather not complicate something when I don't need to.

I've tried a few things, namely this upsert method:

https://blog.alexparunov.com/upserting-update-and-insert-with-pandas

Which does not work for me as my laptop is locked down harder than Fort Knox, so PostgreSQL and psycopg2 can't initialize. And yes, I've tried installed psycopg2-binary.

I'm just needing some guidance and haven't found much on here.

Edit: Here's an example for clarity of what I'm looking to do:

I have this entry in the dataframe:

P/N:        Supplier_Comments:      Comments:
1-100116    "6-27-23 Review with Holly  "24000 reserved

and then I have the part that's already in the database:

P/N:        Supplier_Comments:      Comments:
1-100116    "6-29-23: Shipping 24k Oct" "Next Dock Date 7/31"

I need it to become:

P/N:        Supplier_Comments:      Comments:
1-100116    "6-27-23 Review with Holly  "24000 reserved
        6-29-23: Shipping 24k Oct"  "Next Dock Date 7/31"

NOT

1-100116    "6-27-23 Review with Holly  "24000 reserved 
1-100116    "6-29-23: Shipping 24k Oct" "Next Dock Date 7/31"

I also don't want to combine different primary keys, either. I just want primary keys that already exist to be joined with the SQL entry.

I want this to be done automatically, without me having to go through the rows to find which primary keys already exist in the database.


Solution

  • You can use .to_sql() to upload the updates to a temporary table and then use MERGE to update the main table.

    import pandas as pd
    import sqlalchemy as sa
    
    engine = sa.create_engine("mssql+pyodbc://scott:tiger^5HHH@mssql_199")
    
    # set up environment for this example
    with engine.begin() as conn:
        conn.exec_driver_sql("DROP TABLE IF EXISTS Shortage")
        conn.exec_driver_sql(
            """\
        CREATE TABLE Shortage (
        PN varchar(255) primary key,
        Part_Type varchar(50),
        Comments varchar(max)
        )
        """
        )
        conn.exec_driver_sql(
            """\
        INSERT INTO Shortage (PN, Part_Type, Comments)
        VALUES ('Part_A', 'original type', 'A comment.')
        """
        )
    
    # update the table
    fulldf = pd.DataFrame(
        [
            ("Part_A", "updated type", "Another comment."),
            ("Part_B", "some other type", "This is a new part."),
        ],
        columns=["PN", "Part_Type", "Comments"],
    )
    with engine.begin() as conn:
        fulldf.to_sql("#temp_table", conn, if_exists="replace", index=False)
        conn.exec_driver_sql(
            """\
        MERGE Shortage WITH (HOLDLOCK) AS main
        USING (SELECT PN, Part_Type, Comments FROM #temp_table) AS temp
        ON (main.PN = temp.PN)
        WHEN MATCHED THEN
          UPDATE SET 
            Part_Type = temp.Part_Type, 
            Comments = main.Comments + ' ' + temp.Comments
        WHEN NOT MATCHED THEN
          INSERT (PN, Part_Type, Comments) VALUES (temp.PN, temp.Part_Type, temp.Comments);
        """
        )
    
    # check results
    print(pd.read_sql_table("Shortage", engine))
    """
           PN        Part_Type                     Comments
    0  Part_A     updated type  A comment. Another comment.
    1  Part_B  some other type          This is a new part.
    """