Search code examples
pythonsqlpandaspandas-to-sql

Pandas to_sql() to update unique values in DB?


How can I use the df.to_sql(if_exists = 'append') to append ONLY the unique values between the dataframe and the database. In other words, I would like to evaluate the duplicates between the DF and the DB and drop those duplicates before writing to the database.

Is there a parameter for this?

I understand that the parameters if_exists = 'append' and if_exists = 'replace'is for the entire table - not the unique entries.

I am using: 
sqlalchemy

pandas dataframe with the following datatypes: 
    index: datetime.datetime <-- Primary Key
    float
    float
    float
    float
    integer
    string <---  Primary Key
    string<----  Primary Key

I'm stuck on this so your help is much appreciated. -Thanks


Solution

  • In pandas, there is no convenient argument in to_sql to append only non-duplicates to a final table. Consider using a staging temp table that pandas always replaces and then run a final append query to migrate temp table records to final table accounting only for unique PK's using the NOT EXISTS clause.

    engine = sqlalchemy.create_engine(...)
    
    df.to_sql(name='myTempTable', con=engine, if_exists='replace')
    
    with engine.begin() as cn:
       sql = """INSERT INTO myFinalTable (Col1, Col2, Col3, ...)
                SELECT t.Col1, t.Col2, t.Col3, ...
                FROM myTempTable t
                WHERE NOT EXISTS 
                    (SELECT 1 FROM myFinalTable f
                     WHERE t.MatchColumn1 = f.MatchColumn1
                     AND t.MatchColumn2 = f.MatchColumn2)"""
    
       cn.execute(sql)
    

    This would be an ANSI SQL solution and not restricted to vendor-specific methods like UPSERT and so is compliant in practically all SQL-integrated relational databases.