python sql-server pandas sqlalchemy pyodbc

Pandas to_sql int column turning to float64 because of null values

I am running python 3.10, pandas 1.5.2, sqlalchemy 1.4.46, and pyodbc 4.0.35 on an Ubuntu subsystem on Windows (WSL2).

I think the issue I am having was a bug in prior pandas versions, but everyone is saying it has been fixed, so I am not sure why I am having this issue.

I have a dataframe like this: df=

int_col_1	int_col_2	string_col
1	10	'val1'
2	20	None
3	None	'val3'

Then I push this to an MSSQL database using this code:

connection_string = 'DRIVER={ODBC Driver 18 for SQL Server};SERVER=' + server + ';DATABASE=' + database + ';UID=' + username + ';PWD=' + password + ';Encrypt=no'
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url)
df.to_sql('table', engine, if_exists='replace', index=False)

Or with dtypes like this:

from sqlalchemy.types import Integer, String
df.to_sql('table', engine, if_exists='replace', index=False, dtype={'int_col1_1':Integer(), 'int_col_2':Integer(), 'string_col':String()})

Either way this is what is returned if I pull the table from SQL:

int_col_1	int_col_2	string_col
1	10.0	'val1'
2	20.0	None
3	NaN	'val3'

You'll notice that int_col_1 is fine (int64), string_col is fine (object with NoneType), but int_col_2 is turned into float64.

I know there are methods to correct the data after pulling from SQL, but I am looking for a way to get the data to be correct in SQL when pushed from pandas and I have no idea how to do that. Any help would be really appreciated.

Solution

You can cast the column to an Int64 dtype

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': [1, 2, np.nan]})
df['a'] = df['a'].astype('Int64')
df