I have a set of records that I need to insert into a Sql Server Database using pymssql. These records have the probability scores from a predictive model along with a bunch of other columns. For testing purpose, I have created an excel file with the values I want to insert into MS Sql Server Table.
Some of the records in these columns have null values which are causing issues when I try to insert them.
Below is my code
import pymssql
import pandas as pd
conn = pymssql.connect(server='server name', user='uid', password='pwd',
database='dbname')
cursor=conn.cursor()
#Read Scores from Excel
df = pd.read_excel("Sample_Score_Records.xlsx")
for index,row in df.iterrows():
cursor.execute("INSERT INTO dbo.ANLY_SCORE([id],[scoredatetime],[score1],[score2],[model_name],[score_id],[updated_revenue_ind])
values(%s,%d,%d,%d,%s,%s,%s)", (row['id'],row['scoredatetime'],row['score1'],row['score2'],row['model_name'],row['score_id'],row['updated_revenue_ind']))
conn.commit()
conn.close()
Code works fine when I remove the columns with null values entirely from the insert statement but return the below error when I include them.
Error:
ProgrammingError: (207, "Invalid column name 'nan'.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n")
Below is how my data looks
ID scoredatetime Score1 Score2 Model Name Score_id updated_revenue_ind
0015CF3EE2 10/17/2018 11:22:04 AM 0.2564859 0.365465 Model1 sana87y47164 y
0015CF3EE3 10/17/2018 11:22:04 AM 0.215485 0.458962 Model1 sana87y47165 n
0015CF3EE4 10/17/2018 11:22:04 AM 0.3458963 0.145893 Model1 sana87y47166 NULL
0015CF3EE5 10/17/2018 11:22:04 AM 0.265489 0.987513 Model1 sana87y47167 y
0015CF3EE6 10/17/2018 11:22:04 AM 0.324896 0.154787 Model1 sana87y47168 y
0015CF3EE7 10/17/2018 11:22:04 AM 0.386492 0.482172 Model1 sana87y47169 y
0015CF3EE8 10/17/2018 11:22:04 AM 0.2489714 0.897562 Model1 sana87y47170 y
0015CF3EE9 10/17/2018 11:22:04 AM 0.402365 0.147851 Model1 sana87y47171 n
0015CF3EE10 10/17/2018 11:22:04 AM 0.380236 0.278968 Model1 sana87y47172 NULL
0015CF3EE11 10/17/2018 11:22:04 AM 0.361245 0.478938 Model1 sana87y47173 NULL
Any help is greatly appreciated!
Invalid column name 'nan'
The problem is that pandas does not store null values as Python None
, it stores them as NaN
. pymssql appears to be confused by that.
You probably should just install SQLAlchemy and then use df.to_sql
to insert the rows:
from sqlalchemy import create_engine
# ...
#Read Scores from Excel
df = pd.read_excel("Sample_Score_Records.xlsx")
# rename one DataFrame column to match table column name
df.rename({'Model Name': 'model_name'}, axis='columns', inplace=True)
# insert rows
engine = create_engine('mssql+pymssql://uid:pwd@servername/dbname')
df.to_sql('ANLY_SCORE', engine, schema='dbo', if_exists='append', index=False)