Search code examples
pythonpandasjson-normalize

Pandas json_normalize converts column of int values to float whan one of values is NaN


I have noticed a behaviour that I don't quite understand.

I am doing a conversion of a list of dataclass items into a dataframe. When all values are not-None, everything works as expected:

from dataclasses import dataclass
from dataclasses import asdict
from pandas import json_normalize

@dataclass
class TestItem:
    name: str = None
    id: int = None


test_item_list = [
    TestItem(name='teapot', id=11),
    TestItem(name='kettle', id=12),
    TestItem(name='boiler', id=13)
]

df = json_normalize(asdict(item) for item in test_item_list)
print(df)

result would be this. This is working as intended:

     name  id
0  teapot  11
1  kettle  12
2  boiler  13

but if we change test_item_list like this:

test_item_list = [
    TestItem(name='teapot', id=11),
    TestItem(name='kettle', id=12),
    TestItem(name='boiler')
]

the output will have 'id' column as float values instead of int:

     name    id
0  teapot  11.0
1  kettle  12.0
2  boiler   NaN

df.dtypes will also show that id is now float64 column:

name     object
id      float64
dtype: object

How to solve this issue? In the real working example I have several more complicated item classes, can't manually explicitly convert each of them into desired column type.


Solution

  • You can cast the id column as nullable integer data type.

    >>> df['id'] = df['id'].astype('Int64') # note the capital "I"
    
    >>> print(df.dtypes)
    name    object
    id       Int64
    dtype: object
    

    Output:

         name    id
    0  teapot    11
    1  kettle    12
    2  boiler  <NA>