I have noticed a behaviour that I don't quite understand.
I am doing a conversion of a list of dataclass items into a dataframe. When all values are not-None, everything works as expected:
from dataclasses import dataclass
from dataclasses import asdict
from pandas import json_normalize
@dataclass
class TestItem:
name: str = None
id: int = None
test_item_list = [
TestItem(name='teapot', id=11),
TestItem(name='kettle', id=12),
TestItem(name='boiler', id=13)
]
df = json_normalize(asdict(item) for item in test_item_list)
print(df)
result would be this. This is working as intended:
name id
0 teapot 11
1 kettle 12
2 boiler 13
but if we change test_item_list
like this:
test_item_list = [
TestItem(name='teapot', id=11),
TestItem(name='kettle', id=12),
TestItem(name='boiler')
]
the output will have 'id' column as float values instead of int:
name id
0 teapot 11.0
1 kettle 12.0
2 boiler NaN
df.dtypes
will also show that id is now float64 column:
name object
id float64
dtype: object
How to solve this issue? In the real working example I have several more complicated item classes, can't manually explicitly convert each of them into desired column type.
You can cast the id
column as nullable integer data type.
>>> df['id'] = df['id'].astype('Int64') # note the capital "I"
>>> print(df.dtypes)
name object
id Int64
dtype: object
Output:
name id
0 teapot 11
1 kettle 12
2 boiler <NA>