Search code examples
jsonpandasjson-deserializationjson-serializationjson-normalize

How to use "pandas json normalize" to convert values as it is (as string)?


with some data,

df = pd.DataFrame({
    "data": [
        {"a":1},
        {"b":2.1}
    ]
 })
       data
0  {'a': 1}
1  {'b': 2.1}

expanding the dictionary in the column using .json_normalize:

df2 = pd.json_normalize(df['data'])

     a    b
0  1.0  NaN
1  NaN  2.1

How could those values be represented the same as the original data like below:

     a    b
0    1  NaN
1  NaN  2.1

One dummy solution appears to be to add a string like

df = pd.DataFrame({
    "data": [
        {"a":1},
        {"b":2.1},
        {"a": "dummy"}
    ]
 })

then

pd.json_normalize(df['data'])
       a    b
0      1  NaN
1    NaN  2.1
2  dummy  NaN

what would be some smarter solutions?


Solution

  • You could try this:

    pd.json_normalize(
        df["data"].apply(lambda x: {key: str(value) for key, value in x.items()})
    )
    
    # Output in Jupyter cell
         a    b
    0    1  NaN
    1  NaN  2.1