Search code examples
pythondataframenested-lists

How to de-nested a list of list of dictionary into a DataFrame?


I have a list of list of dictionary like this

['[{"date_update":"31-03-2022","diemquatrinh":"6.0"}]',  

'[{"date_update":"28-04-2022","diemquatrinh":"6.5"}]', 

'[{"date_update":"25-12-2021","diemquatrinh":"6.0"}, {"date_update":"28-04-2022","diemquatrinh":"6.25"},{"date_update":"28-07-2022","diemquatrinh":"6.5"}]',

'[{"date_update":null,"diemquatrinh":null}]']

    

I don't know how to make them into a DataFrame with 2 columns like this. I'm looking forward to your help. Thank you!

updated_at diemquatrinh
11-03-2022 6.25
25-12-2021 6.0
28-04-2022 6.25
28-07-2022 6.5
null null

Solution

  • First, convert strings to dictionary.

    import pandas as pd
    import json
    
    example_data=['[{"date_update":"31-03-2022","diemquatrinh":"6.0"}]',  
    
    '[{"date_update":"28-04-2022","diemquatrinh":"6.5"}]', 
    
    '[{"date_update":"25-12-2021","diemquatrinh":"6.0"}, {"date_update":"28-04-2022","diemquatrinh":"6.25"},{"date_update":"28-07-2022","diemquatrinh":"6.5"}]',
    
    '[{"date_update":null,"diemquatrinh":null}]']
    
    listt=[]
    for i in example_data:
        listt.append(json.loads(i))
    

    when i examine the data, each dictionary has the same keys. This means I can collect all dictionaries in one list.

    
    main_list = [item for sublist in listt for item in sublist]
    print(main_list)
    '''
    [{'date_update': '31-03-2022', 'diemquatrinh': '6.0'}, {'date_update': '28-04-2022', 'diemquatrinh': '6.5'}, {'date_update': '25-12-2021', 'diemquatrinh': '6.0'}, {'date_update': '28-04-2022', 'diemquatrinh': '6.25'}, {'date_update': '28-07-2022', 'diemquatrinh': '6.5'}, {'date_update': None, 'diemquatrinh': None}]
    '''
    

    All that's left is to convert the list to a dataframe:

    df=pd.DataFrame(main_list)
    print(df)
    '''
        date_update diemquatrinh
    0   31-03-2022  6.0
    1   28-04-2022  6.5
    2   25-12-2021  6.0
    3   28-04-2022  6.25
    4   28-07-2022  6.5
    5   None        None
    
    '''