Looping trough dataframe and checking rows before appending to database

Question How do I append my dataframe to database so that it checks if stock_ticker exists , only to append the rows where stock_ticker does not exist?

This is the process that I did

Import CSV file to pandas dataframe
Assign column names to be same as in database
Sending the dataframe to database using the code below but getting

sqlite3.IntegrityError: UNIQUE constraint failed: stocks.stock_ticker

conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()

df.to_sql(name='stocks', con=conn, if_exists='append', index=False)

conn.commit()

I looked at other Integrity Error cases but can't seem to find one that works with appending dataframes? I found and tried this but all it does is just not append anything.

try:
    conn = sqlite3.connect('stockmarket.db')
    c = conn.cursor()
    df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
    conn.commit()
except sqlite3.IntegrityError:
    print("Already in database")

I am not sure I am understanding the iterating thing correctly

How to iterate over rows in a DataFrame in Pandas

So I tried this, but it just prints out already in database for each of them. Even tough there is 4 new stock tickers.

for index, row in df.iterrows():
    try:
        conn = sqlite3.connect('stockmarket.db')
        c = conn.cursor()
        df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
        conn.commit()
    except sqlite3.IntegrityError:
        print("Already in database")

The database looks like this

any insight much appreciated :)

Solution

It looks like this happens because Pandas doesn't allow for declaring a proper ON CONFLICT policy, in case you try to append data to a table that has the same (unique) primary key or violates some other UNIQUEness constraint. if_exists only refers to the whole table itself, not each individual row.

I think you already came up with a pretty good answer, and maybe with a small modification it would work for you:

# After connecting
for i in range(len(df)):
    try:
        df[df.index == i].to_sql(name='stocks', con=conn, if_exists='append', index=False)
        conn.commit()
    except sqlite3.IntegrityError:
        pass

Now, this might be a problem if you want to actually replace the value if a newer one appears in your Pandas data and let's say you want to replace the old one that you have in the database. In that case, you might want to use the raw SQL command as a string, and pass the Pandas values iteratively. For example:

insert_statement = """
INSERT INTO stocks (stock_id,
                    stock_ticker,
                    {other columns})
VALUES (%s, %s, {as many %s as columns})
ON CONFLICT (stock_id) DO UPDATE
    SET {Define which values you will update on conflict}"""

And then you could run

for i in range(len(df)):
    values = tuple(df.iloc[i])
    cursor.execute(insert_statement, values)