Search code examples
pythonpandasdataframedictionarydata-cleaning

Mapping dictionary to pandas dataframe with lists


I have a dictionary where I would like to map it to a large pandas dataframe. The issue is the column I would like to use to map is wrapped in double quotes and sometimes there are one or more items in that column.

original

dict_id = {
   'College1': ['1256511'],
   'College2': ['1200582'],
   'College3': ['1256618'], 
   'College10': ['1256621']
}


   id1  id2            college_name
0   01   01    "College1, College2"
1   01   02  "College10, College12"
2   01   03             "College19"

desired

id1 id2    college_name              id_college
01   01     "College1, College2"    1256511, 1200582
01   02     "College10, College12"  1256621
01   03     "College19"

Solution

  • Your data is better formatted imo after explode, but I put it all back to how it was at the end~

    df.college_name = df.college_name.str[1:-1].str.split(', ')
    df = df.explode('college_name')
    
    df2 = pd.DataFrame.from_dict(dict_id, 'index', columns=['id_college'], dtype=str)
    
    df = df.merge(df2, left_on='college_name', right_index=True, how='left')
    
    df = df.fillna('').groupby(['id1', 'id2'], as_index=False).agg(', '.join)
    
    df.college_name = '"' + df.college_name + '"'
    
    print(df)
    

    Output:

      id1 id2            college_name        id_college
    0  01  01    "College1, College2"  1256511, 1200582
    1  01  02  "College10, College12"         1256621,
    2  01  03             "College19"