Search code examples
pythonpandassqlitedataframefinance

Looping trough dataframe and checking rows before appending to database


Question How do I append my dataframe to database so that it checks if stock_ticker exists , only to append the rows where stock_ticker does not exist?

This is the process that I did

  1. Import CSV file to pandas dataframe
  2. Assign column names to be same as in database
  3. Sending the dataframe to database using the code below but getting

sqlite3.IntegrityError: UNIQUE constraint failed: stocks.stock_ticker

conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()

df.to_sql(name='stocks', con=conn, if_exists='append', index=False)

conn.commit()

I looked at other Integrity Error cases but can't seem to find one that works with appending dataframes? I found and tried this but all it does is just not append anything.

try:
    conn = sqlite3.connect('stockmarket.db')
    c = conn.cursor()
    df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
    conn.commit()
except sqlite3.IntegrityError:
    print("Already in database")

I am not sure I am understanding the iterating thing correctly

How to iterate over rows in a DataFrame in Pandas

So I tried this, but it just prints out already in database for each of them. Even tough there is 4 new stock tickers.

for index, row in df.iterrows():
    try:
        conn = sqlite3.connect('stockmarket.db')
        c = conn.cursor()
        df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
        conn.commit()
    except sqlite3.IntegrityError:
        print("Already in database")

enter image description here

The database looks like this enter image description here

any insight much appreciated :)


Solution

  • It looks like this happens because Pandas doesn't allow for declaring a proper ON CONFLICT policy, in case you try to append data to a table that has the same (unique) primary key or violates some other UNIQUEness constraint. if_exists only refers to the whole table itself, not each individual row.

    I think you already came up with a pretty good answer, and maybe with a small modification it would work for you:

    # After connecting
    for i in range(len(df)):
        try:
            df[df.index == i].to_sql(name='stocks', con=conn, if_exists='append', index=False)
            conn.commit()
        except sqlite3.IntegrityError:
            pass
    

    Now, this might be a problem if you want to actually replace the value if a newer one appears in your Pandas data and let's say you want to replace the old one that you have in the database. In that case, you might want to use the raw SQL command as a string, and pass the Pandas values iteratively. For example:

    insert_statement = """
    INSERT INTO stocks (stock_id,
                        stock_ticker,
                        {other columns})
    VALUES (%s, %s, {as many %s as columns})
    ON CONFLICT (stock_id) DO UPDATE
        SET {Define which values you will update on conflict}"""
    

    And then you could run

    for i in range(len(df)):
        values = tuple(df.iloc[i])
        cursor.execute(insert_statement, values)