Search code examples
pythonmysqlpandaspandasql

Need to edit subset of rows from MySQL table using Pandas Dataframe


I'm in the process of trying to alter a table in my database. However I am finding it difficult using the to_sql method provided by Pandas. My price_data Dataframe looks something like this:

Initial Dataframe (as rows in the database):

enter image description here

Code used to alter data:

with con:
    price_data.to_sql(con=con, name='clean_prices2', if_exists='append', index=False, flavor='mysql')

The ultimate goal here is to modify the initial dataframe (converting zero values into Nan's, then interpolate them), and saving it back in the database. The results should look like this (except with the same id):

Desired Output:

enter image description here

If you look specifically at the close_price column you can see the 0 value was assigned 90.7350

My current solution is appending the datarows, which results in duplicate enteries like this:

Actual Output:

enter image description here

Finally, I would have to perform another query to remove the duplicate rows (based on price_date)

I know I could change the if_exists parameter to replace but that would remove the rest of my database table. Basically I want to perform this query multiple times on different symbol_id's

Is there anyway to modify a subset (in the case, just the 3 rows) without removing the rest of the data in my table? The solution can either modify the existing rows (keeping the same id) or delete the old rows, and create the new ones without zeroes. I am just trying to accomplish this without the additional delete duplicate query.


Solution

  • Consider a temp table with exact structure as final table but regularly replaced and will then be used to update existing final table. Try using sqlalchemy engine for both operations.

    Specifically, for the latter SQL you would use an UPDATE JOIN query between temp and final table. Below assumes you use pymysql module (adjust as needed):

    import pymysql
    from sqlalchemy import create_engine
    ...
    
    engine = create_engine("mysql+pymysql://user:password@hostname:port/database")
    
    # PANDAS UPLOAD
    price_data.to_sql(name='clean_prices_temp', con=engine, if_exists='replace', index=False)
    
    # SQL UPDATE (USING TRANSACTION)
    with engine.begin() as conn:     
        conn.execute("UPDATE clean_prices_final f" +
                     " INNER JOIN clean_prices_temp t" +
                     " ON f.symbol_id = t.symbol_id" +
                     " AND f.price_date = t.price_date" +
                     " SET f.open_price = t.open_price," +
                     "     f.high_price = t.high_price," +
                     "     f.low_price = t.low_price," +
                     "     f.close_price = t.close_price," +
                     "     f.adj_close_price = t.adj_close_price;")
    
    engine.dispose()