I am having trouble ignoring a duplicate 'date' values when I am importing data
import pymssql
import datetime as dt
server = 'xx'
user = 'xx'
password = 'xx'
db = 'xx'
conn = pymssql.connect(server, user, password, db, as_dict=True)
cur = conn.cursor()
Table_create = 'CREATE TABLE table1 (t_date date, price FLOAT)'
cur.execute(Table_create)
today = str(dt.datetime.now().date())
yesterday = str(dt.datetime.now().date() - dt.timedelta(days=1))
nextday = str(dt.datetime.now().date() + dt.timedelta(days=1))
cur.executemany("INSERT INTO table1(t_date, price) VALUES(%s,%d)", [(today, 100), (yesterday, 200)])
conn.commit()
I am getting
In[78]: x
Out[78]:
[{u'price': 100.0, u't_date': u'2016-09-25'},
{u'price': 200.0, u't_date': u'2016-09-24'}]
I want to ignore a same date record if I try to insert it again such has:
cur.executemany("INSERT INTO table1(t_date, price) VALUES(%s,%d)", [(today, 100), (nextday, 200)])
conn.commit()
but I get duplicate such has:
In[80]: x
Out[80]:
[{u'price': 100.0, u't_date': u'2016-09-25'},
{u'price': 200.0, u't_date': u'2016-09-24'},
{u'price': 100.0, u't_date': u'2016-09-25'},
{u'price': 200.0, u't_date': u'2016-09-26'}]
Thanks for help!
You have a few options, you can make date your primary key, then on any duplicate entries just update the price using ON DUPLICATE KEY UPDATE:
Table_create = '''CREATE TABLE table1 (t_date date PRIMARY KEY, price FLOAT )'''
cur.execute(Table_create)
today = str(dt.datetime.now().date())
yesterday = str(dt.datetime.now().date() - dt.timedelta(days=1))
nextday = str(dt.datetime.now().date() + dt.timedelta(days=1))
cur.executemany("INSERT INTO table1(t_date, price) VALUES(%s, %s) on DUPLICATE KEY UPDATE price=price",([(today, 100), (yesterday, 200)]))
That would only update the price if you happened to get a different/new price for an existing row or do an insert for a new date
You could also just IGNORE any duplicates:
"INSERT INSERT IGNORE INTO table1(t_date, price) VALUES(%s, %s)"
But IGNORE will ignore more than an IntegrityError
based on the duplicate entry so I would personally prefer the former.
Another option if you wanted to have an auto_increment id and to use both t_date and price or just the t_date would be to create a UNIQUE CONSTRAINT
Table_create = '''CREATE TABLE table1 (
ID int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
t_date date,
price FLOAT,
CONSTRAINT no_dupes UNIQUE (t_date, price))''' # CONSTRAINT no_dupes UNIQUE (t_date) for just the t_date
cur.execute(Table_create)
The same logic would apply when inserting.