Search code examples
sqlpython-2.7pymssql

how to INSERT but IGNORE same date tag with pymssql python


I am having trouble ignoring a duplicate 'date' values when I am importing data

import pymssql
import datetime as dt

server = 'xx'
user = 'xx'
password = 'xx'
db = 'xx'
conn = pymssql.connect(server, user, password, db, as_dict=True)

cur = conn.cursor()
Table_create = 'CREATE TABLE table1 (t_date date, price FLOAT)'
cur.execute(Table_create)

today = str(dt.datetime.now().date())
yesterday = str(dt.datetime.now().date() - dt.timedelta(days=1))
nextday = str(dt.datetime.now().date() + dt.timedelta(days=1))
cur.executemany("INSERT  INTO table1(t_date, price) VALUES(%s,%d)", [(today, 100), (yesterday, 200)])
conn.commit()

I am getting

In[78]: x
Out[78]: 
[{u'price': 100.0, u't_date': u'2016-09-25'},
 {u'price': 200.0, u't_date': u'2016-09-24'}]

I want to ignore a same date record if I try to insert it again such has:

cur.executemany("INSERT  INTO table1(t_date, price) VALUES(%s,%d)", [(today, 100), (nextday, 200)])
conn.commit()

but I get duplicate such has:

In[80]: x
Out[80]: 
[{u'price': 100.0, u't_date': u'2016-09-25'},
 {u'price': 200.0, u't_date': u'2016-09-24'},
 {u'price': 100.0, u't_date': u'2016-09-25'},
 {u'price': 200.0, u't_date': u'2016-09-26'}]

Thanks for help!


Solution

  • You have a few options, you can make date your primary key, then on any duplicate entries just update the price using ON DUPLICATE KEY UPDATE:

    Table_create = '''CREATE TABLE table1 (t_date date PRIMARY KEY, price FLOAT )'''
    cur.execute(Table_create)
    
    today = str(dt.datetime.now().date())
    yesterday = str(dt.datetime.now().date() - dt.timedelta(days=1))
    nextday = str(dt.datetime.now().date() + dt.timedelta(days=1))
    cur.executemany("INSERT  INTO table1(t_date, price) VALUES(%s, %s) on DUPLICATE KEY UPDATE price=price",([(today, 100), (yesterday, 200)]))
    

    That would only update the price if you happened to get a different/new price for an existing row or do an insert for a new date

    You could also just IGNORE any duplicates:

    "INSERT INSERT IGNORE INTO table1(t_date, price) VALUES(%s, %s)"
    

    But IGNORE will ignore more than an IntegrityError based on the duplicate entry so I would personally prefer the former.

    Another option if you wanted to have an auto_increment id and to use both t_date and price or just the t_date would be to create a UNIQUE CONSTRAINT

      Table_create = '''CREATE TABLE table1 (
                                ID int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
                                t_date date,
                                price FLOAT,
                                CONSTRAINT no_dupes UNIQUE  (t_date, price))''' # CONSTRAINT no_dupes UNIQUE (t_date) for just the t_date
    cur.execute(Table_create)
    

    The same logic would apply when inserting.