Search code examples
pythondata-cleaningfeature-extractiondata-extractionfeature-engineering

Convert a column of list of dictionaries to a column list such that the values are derived from the key "name" under each dictionary in the list


The input column has a variable number of dictionary lists, it is not fixed.

INPUT column:

Facilities
[{'name': 'Work from home', 'icon': 'WFH.svg'}]
[{'name': 'Gymnasium', 'icon': 'Gym.svg'}, {'name': 'Cafeteria', 'icon': 'Cafeteria.svg'}, {'name': 'Work from home', 'icon': 'WFH.svg'}]
[{'name': 'Free food', 'icon': 'FreeFood.svg'}, {'name': 'Team outings', 'icon': 'TeamOuting.svg'}, {'name': 'Education assistance', 'icon': 'Education.svg'}]
[{'name': 'Soft skill training', 'icon': 'SoftSkillsTraining.svg'}, {'name': 'Job training', 'icon': 'JobTraining.svg'}]
[{'name': 'Free transport', 'icon': 'Transportation.svg'}, {'name': 'Work from home', 'icon': 'WFH.svg'}, {'name': 'Team outings', 'icon': 'TeamOuting.svg'}, {'name': 'Soft skill training', 'icon': 'SoftSkillsTraining.svg'}]

This above input should be filtered so that the column will only have a single list with all the values of keys "name" collected from different dictionaries in the list.

Desired Output column:

Facilities
['Work from home']
['Gymnasium', 'Cafeteria', 'Work from home']
['Free food','Team outings','Education assistance']
['Soft skill training','Job training']
['Free transport', 'Work from home','Team outings','Soft skill training']

Solution

  • Suppose you have this DataFrame:

    df = pd.DataFrame({'Facilities':[
    [{'name': 'Work from home', 'icon': 'WFH.svg'}],
    [{'name': 'Gymnasium', 'icon': 'Gym.svg'}, {'name': 'Cafeteria', 'icon': 'Cafeteria.svg'}, {'name': 'Work from home', 'icon': 'WFH.svg'}],
    [{'name': 'Free food', 'icon': 'FreeFood.svg'}, {'name': 'Team outings', 'icon': 'TeamOuting.svg'}, {'name': 'Education assistance', 'icon': 'Education.svg'}],
    [{'name': 'Soft skill training', 'icon': 'SoftSkillsTraining.svg'}, {'name': 'Job training', 'icon': 'JobTraining.svg'}],
    [{'name': 'Free transport', 'icon': 'Transportation.svg'}, {'name': 'Work from home', 'icon': 'WFH.svg'}, {'name': 'Team outings', 'icon': 'TeamOuting.svg'}, {'name': 'Soft skill training', 'icon': 'SoftSkillsTraining.svg'}],
        ]})
    
    print(df)
    
                                              Facilities
    0    [{'name': 'Work from home', 'icon': 'WFH.svg'}]
    1  [{'name': 'Gymnasium', 'icon': 'Gym.svg'}, {'n...
    2  [{'name': 'Free food', 'icon': 'FreeFood.svg'}...
    3  [{'name': 'Soft skill training', 'icon': 'Soft...
    4  [{'name': 'Free transport', 'icon': 'Transport...
    

    Then:

    df['Facilities'] = df['Facilities'].apply(lambda x: [d['name'] for d in x])
    print(df)
    

    Prints:

                                              Facilities
    0                                   [Work from home]
    1             [Gymnasium, Cafeteria, Work from home]
    2    [Free food, Team outings, Education assistance]
    3                [Soft skill training, Job training]
    4  [Free transport, Work from home, Team outings,...