Search code examples
pythonsql-serverpandaspymssql

Error while inserting records with Null values into SQL Server using Pymssql


I have a set of records that I need to insert into a Sql Server Database using pymssql. These records have the probability scores from a predictive model along with a bunch of other columns. For testing purpose, I have created an excel file with the values I want to insert into MS Sql Server Table.

Some of the records in these columns have null values which are causing issues when I try to insert them.

Below is my code

import pymssql
import pandas as pd

conn = pymssql.connect(server='server name', user='uid', password='pwd', 
database='dbname')
cursor=conn.cursor()

#Read Scores from Excel
df = pd.read_excel("Sample_Score_Records.xlsx")

for index,row in df.iterrows():
    cursor.execute("INSERT INTO dbo.ANLY_SCORE([id],[scoredatetime],[score1],[score2],[model_name],[score_id],[updated_revenue_ind]) 
values(%s,%d,%d,%d,%s,%s,%s)", (row['id'],row['scoredatetime'],row['score1'],row['score2'],row['model_name'],row['score_id'],row['updated_revenue_ind']))
    conn.commit()
conn.close()

Code works fine when I remove the columns with null values entirely from the insert statement but return the below error when I include them.

Error:

ProgrammingError: (207, "Invalid column name 'nan'.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n")

Below is how my data looks

ID  scoredatetime   Score1  Score2  Model Name  Score_id    updated_revenue_ind
0015CF3EE2  10/17/2018 11:22:04 AM  0.2564859   0.365465    Model1  sana87y47164    y
0015CF3EE3  10/17/2018 11:22:04 AM  0.215485    0.458962    Model1  sana87y47165    n
0015CF3EE4  10/17/2018 11:22:04 AM  0.3458963   0.145893    Model1  sana87y47166    NULL
0015CF3EE5  10/17/2018 11:22:04 AM  0.265489    0.987513    Model1  sana87y47167    y
0015CF3EE6  10/17/2018 11:22:04 AM  0.324896    0.154787    Model1  sana87y47168    y
0015CF3EE7  10/17/2018 11:22:04 AM  0.386492    0.482172    Model1  sana87y47169    y
0015CF3EE8  10/17/2018 11:22:04 AM  0.2489714   0.897562    Model1  sana87y47170    y
0015CF3EE9  10/17/2018 11:22:04 AM  0.402365    0.147851    Model1  sana87y47171    n
0015CF3EE10 10/17/2018 11:22:04 AM  0.380236    0.278968    Model1  sana87y47172    NULL
0015CF3EE11 10/17/2018 11:22:04 AM  0.361245    0.478938    Model1  sana87y47173    NULL

Any help is greatly appreciated!


Solution

  • Invalid column name 'nan'

    The problem is that pandas does not store null values as Python None, it stores them as NaN. pymssql appears to be confused by that.

    You probably should just install SQLAlchemy and then use df.to_sql to insert the rows:

    from sqlalchemy import create_engine
    # ...
    
    #Read Scores from Excel
    df = pd.read_excel("Sample_Score_Records.xlsx")
    
    # rename one DataFrame column to match table column name
    df.rename({'Model Name': 'model_name'}, axis='columns', inplace=True)
    
    # insert rows
    engine = create_engine('mssql+pymssql://uid:pwd@servername/dbname')
    df.to_sql('ANLY_SCORE', engine, schema='dbo', if_exists='append', index=False)