How can I force a pandas DataFrame to retain None
values, even when using astype()
?
Since the pd.DataFrame
constructor offers no compound dtype
parameter, I fix the types (required for to_parquet()
) with the following function:
def _typed_dataframe(data: list) -> pd.DataFrame:
typing = {
'name': str,
'value': np.float64,
'info': str,
'scale': np.int8,
}
result = pd.DataFrame(data)
for label in result.keys():
result[label] = result[label].astype(typing[label])
return result
Unfortunately, result[info] = result[info].astype(str)
transforms all None
values in info
to "None"
strings. How can I forbid this, i.e. retain None
values?
To be more precise: None
values in data
become np.nan
in the result
DataFrame, which become "nan"
by astype(str)
, which become "None"
when extracted from result
.
Following @frosty's comment, we can use the alternative
typing = {
'name': str,
'value': np.float64,
'info': pd.StringDtype(),
'scale': np.int8,
}
However, this requires pandas ~= 1.0.0
.
As better solution, you can replace
for label in result.keys():
result[label] = result[label].astype(typing[label])
by
result.astype(schema)
Unfortunately, result.astype(typing)
has no effect since it cannot handle compound types.