I am running python 3.10, pandas 1.5.2, sqlalchemy 1.4.46, and pyodbc 4.0.35 on an Ubuntu subsystem on Windows (WSL2).
I think the issue I am having was a bug in prior pandas versions, but everyone is saying it has been fixed, so I am not sure why I am having this issue.
I have a dataframe like this: df=
int_col_1 | int_col_2 | string_col |
---|---|---|
1 | 10 | 'val1' |
2 | 20 | None |
3 | None | 'val3' |
Then I push this to an MSSQL database using this code:
connection_string = 'DRIVER={ODBC Driver 18 for SQL Server};SERVER=' + server + ';DATABASE=' + database + ';UID=' + username + ';PWD=' + password + ';Encrypt=no'
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url)
df.to_sql('table', engine, if_exists='replace', index=False)
Or with dtypes like this:
from sqlalchemy.types import Integer, String
df.to_sql('table', engine, if_exists='replace', index=False, dtype={'int_col1_1':Integer(), 'int_col_2':Integer(), 'string_col':String()})
Either way this is what is returned if I pull the table from SQL:
int_col_1 | int_col_2 | string_col |
---|---|---|
1 | 10.0 | 'val1' |
2 | 20.0 | None |
3 | NaN | 'val3' |
You'll notice that int_col_1 is fine (int64
), string_col is fine (object with NoneType
), but int_col_2 is turned into float64
.
I know there are methods to correct the data after pulling from SQL, but I am looking for a way to get the data to be correct in SQL when pushed from pandas and I have no idea how to do that. Any help would be really appreciated.
You can cast the column to an Int64
dtype
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1, 2, np.nan]})
df['a'] = df['a'].astype('Int64')
df
a
0 1
1 2
2 <NA>