I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.
I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaN
s in a pure int column so you have to use Int64
. When I try to insert this into Spanner I get an error that it is not an int64
type, whereas pure Python int
s do work. Is there an automatic way to convert Int64
Pandas values to int
values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?
Trying to convert from a Series goes like so:
>>>s2=pd.Series([3.0,5.0])
>>>s2
0 3.0
1 5.0
dtype: float64
>>>s1=pd.Series([3.0,None])
>>>s1
0 3.0
1 NaN
dtype: float64
>>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
>>>df
0 1
0 3 NaN
1 3 5.0
>>>df = pd.DataFrame(data={"nullable": s1, "nonnullable": s2}, dtype=np.int64)
this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer
My solution was to leave it as NaN
(it turns out NaN == 'nan'
). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN
with None
in the DF. I used code from another SO answer: df.replace({pd.np.nan: None})
. Spanner was looking at the NaN
as a 'nan'
string and rejecting that for insertion into an Int64 column. None
is treated as NULL
and can get inserted into Spanner with no issue.