Search code examples
pythonjsondataframejson-normalize

Convert JSON data in data frame Python


I am beginner of programming language, so it would be appreciated you help and support.

Here is DataFrame and one column' data is JSON type? of data.

ID, Name, Information
1234, xxxx, '{'age': 25, 'gender': 'male'}'
2234, yyyy, '{'age': 34, 'gender': 'female'}'
3234, zzzz, '{'age': 55, 'gender': 'male'}'

I would like to covert this DataFrame as below.

ID, Name, age, gender
1234, xxxx, 25, male
2234, yyyy, 34, female
3234, zzzz, 55, male

I found that ast.literal_eval() can convert str to dict type, but I have no idea how to write code of this issue.

Would you please give some example of a code which can solve this issue?


Solution

  • Given test.csv

    ID,Name,Information
    1234,xxxx,"{'age': 25, 'gender': 'male'}"
    2234,yyyy,"{'age': 34, 'gender': 'female'}"
    3234,zzzz,"{'age': 55, 'gender': 'male'}"
    
    • Read the file in with pd.read_csv and use the converters parameter with ast.literal_eval, which will convert the data in the Information column from a str type to dict type.
    • Use pd.json_normalize to unpack the dict with keys as column headers and values in the rows
    • .join the normalized columns with df
    • .drop the Information column
    import pandas as pd
    from ast import literal_eval
    
    df = pd.read_csv('test.csv', converters={'Information': literal_eval})
    
    df = df.join(pd.json_normalize(df.Information))
    
    df.drop(columns=['Information'], inplace=True)
    
    # display(df)
         ID  Name  age  gender
    0  1234  xxxx   25    male
    1  2234  yyyy   34  female
    2  3234  zzzz   55    male
    

    If the data is not from a csv file

    import pandas as pd
    from ast import literal_eval
    
    data = {'ID': [1234, 2234, 3234],
            'Name': ['xxxx', 'yyyy', 'zzzz'],
            'Information': ["{'age': 25, 'gender': 'male'}", "{'age': 34, 'gender': 'female'}", "{'age': 55, 'gender': 'male'}"]}
    
    df = pd.DataFrame(data)
    
    # apply literal_eval to Information
    df.Information = df.Information.apply(literal_eval)
    
    # normalize the Information column and join to df
    df = df.join(pd.json_normalize(df.Information))
    
    # drop the Information column
    df.drop(columns=['Information'], inplace=True)