Question How do I append my dataframe to database so that it checks if stock_ticker exists , only to append the rows where stock_ticker does not exist?
This is the process that I did
sqlite3.IntegrityError: UNIQUE constraint failed: stocks.stock_ticker
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
I looked at other Integrity Error cases but can't seem to find one that works with appending dataframes? I found and tried this but all it does is just not append anything.
try:
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
except sqlite3.IntegrityError:
print("Already in database")
I am not sure I am understanding the iterating thing correctly
How to iterate over rows in a DataFrame in Pandas
So I tried this, but it just prints out already in database for each of them. Even tough there is 4 new stock tickers.
for index, row in df.iterrows():
try:
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
except sqlite3.IntegrityError:
print("Already in database")
any insight much appreciated :)
It looks like this happens because Pandas doesn't allow for declaring a proper ON CONFLICT
policy, in case you try to append data to a table that has the same (unique) primary key or violates some other UNIQUEness constraint. if_exists
only refers to the whole table itself, not each individual row.
I think you already came up with a pretty good answer, and maybe with a small modification it would work for you:
# After connecting
for i in range(len(df)):
try:
df[df.index == i].to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
except sqlite3.IntegrityError:
pass
Now, this might be a problem if you want to actually replace the value if a newer one appears in your Pandas data and let's say you want to replace the old one that you have in the database. In that case, you might want to use the raw SQL command as a string, and pass the Pandas values iteratively. For example:
insert_statement = """
INSERT INTO stocks (stock_id,
stock_ticker,
{other columns})
VALUES (%s, %s, {as many %s as columns})
ON CONFLICT (stock_id) DO UPDATE
SET {Define which values you will update on conflict}"""
And then you could run
for i in range(len(df)):
values = tuple(df.iloc[i])
cursor.execute(insert_statement, values)