Search code examples
pythonpandaspyodbc

How to update SQL table column with a list of new values from a pandas DataFrame?


Is there a way to update a column in an SQL table with some new data using pandas? maybe with a list of values? Essentially what I would like to do is...

  1. Connect to database
  2. Grab a table from the database-> covert to DataFrame
  3. Run a script to update the values of a column of that DataFrame
  4. Update the database table with new values / new DataFrame

The tables I am working with are large (> 100,000 rows).

I can do steps 1, 2, & 3, but I don't know how to do step 4 and put the updated values back into the database table...

example script

import pandas as pd
import pyodbc as odbc

sql_conn = odbc.connect(<connections tuff>)

query = "SELECT * FROM myTable"
df = pd.read_sql(query, sql_conn)

myNewValueList = [11,12,13,14,15,16,17,18,19,20,….]  # long list of new values to update with
df[newColumnValues] = myNewValueList 

sql = "UPDATE myTable SET myColumn = %s"
val = df[newColumnValues]

mycursor.execute(sql_conn , val)

Solution

  • import pandas as pd
    import pyodbc as odbc
    
    # Connect to the database
    sql_conn = odbc.connect(<connection_stuff>)
    
    # Read the data
    query = "SELECT * FROM myTable"
    df = pd.read_sql(query, sql_conn)
    
    # Update the DataFrame
    myNewValueList = [11, 12, 13, 14, ...]  # Your new values
    df['myColumn'] = myNewValueList
    
    # Update statement
    sql = "UPDATE myTable SET myColumn = ? WHERE <PrimaryKeyColumn> = ?"
    
    # Update the database
    cursor = sql_conn.cursor()
    for index, row in df.iterrows():
        cursor.execute(sql, (row['myColumn'], row['<PrimaryKeyColumn>']))
    
    # Commit the changes and close the connection
    sql_conn.commit()
    cursor.close()
    sql_conn.close()
    

    In the above code snippet, replace <connection_stuff>, <PrimaryKeyColumn>, and myColumn with your actual connection details, primary key column, and column to be updated. The primary key is used to uniquely identify each row for the update.

    This method executes an SQL update for each row, which might not be very efficient for large datasets. For more efficient bulk updates, you may want to consider pandas' to_sql method. It allows you to write the DataFrame to the SQL table directly, which can be faster and more straightforward than updating each row individually as in the above approach.

    However, to_sql has its nuances. It is used for inserting new rows into a database, and handling updates requires considering the if_exists parameter. So, what you can do is write your modified DataFrame to a temporary table in your database, perform an update from the temporary table to the actual target table, and, finally, drop the temporary table. Following is an implementation using this approach:

    import pandas as pd
    import pyodbc as odbc
    from sqlalchemy import create_engine
    
    # Connect to the database using SQLAlchemy (needed for to_sql)
    # Replace <connection_stuff> with your database details
    engine = create_engine('mssql+pyodbc://<connection_stuff>')
    
    # Existing code to modify the DataFrame
    sql_conn = odbc.connect(<connection_stuff>)
    query = "SELECT * FROM myTable"
    df = pd.read_sql(query, sql_conn)
    myNewValueList = [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
    df['newColumnValues'] = myNewValueList
    
    # Write the modified DataFrame to a temporary table
    temp_table_name = 'temp_myTable'
    df.to_sql(temp_table_name, engine, if_exists='replace', index=False)
    
    # Execute an SQL query to update the original table from the temporary table
    with engine.connect() as conn:
        update_query = f"""
        UPDATE myTable
        SET myColumn = temp.newColumnValues
        FROM myTable
        INNER JOIN {temp_table_name} as temp
        ON myTable.id = temp.id  -- Assuming 'id' is the unique identifier
        """
        conn.execute(update_query)
    
        # Drop the temporary table
        conn.execute(f"DROP TABLE {temp_table_name}")
    

    Please note that this method requires the creation of a temporary table, which might require permissions depending on your database setup.