Search code examples
pythonjsonpandasnormalize

Flatten nested JSON and concatenate to dataframe using pandas


I have searched for a lot of similar topics online, but I have not found the solution yet.

My pandas dataframe looks like this:

index    FOR
0        [{'id': '2766', 'name': '0803 Computer Softwar...
1        [{'id': '2766', 'name': '0803 Computer Softwar...
2        [{'id': '2766', 'name': '0803 Computer Softwar...
3        [{'id': '2766', 'name': '0803 Computer Softwar...
4        [{'id': '2766', 'name': '0803 Computer Softwar...

And I would like to flatten all 4 rows to become like the following dataframe while below is just the result for the first row:

index   id      name
0       2766    0803 Computer Software

I found a similar solution here. Unfortunately, I got a "TypeError" as the following: TypeError: the JSON object must be str, bytes or bytearray, not 'list'

My code was:

dfs = []
for i in test['FOR']:
    data = json.loads(i)
    dfx = pd.json_normalize(data)
    dfs.append(dfx)   

df = pd.concat(dfs).reset_index(inplace = True)
print(df)

Would anyone can help me here? Thank you very much!


Solution

  • After a few weeks not touching related works, I encountered another similar case and I think I have got the solution so far for this case. Please feel free to correct me or provide any other ideas. I really appreciated all the helps and all the generous support!

    chuck = []
    
    for i in range(len(test)):
        chuck.append(json_normalize(test.iloc[i,:]['FOR']))
    
    test_df = pd.concat(chuck)
    

    And then drop duplicated columns for the test_df