Search code examples
pandasgroup-bytransform

How to create new columns for unique values on groupby using pandas?


Say I have a pandas dataframe like,

id,date,school,name
1,2019-01-01,U,Doug
1,2021-01-01,U,Chris
1,2022-01-01,U,Chris
1,2023-01-01,U,Chris
1,2022-01-01,K,Liam
3,2024-01-01,M,Nancy
3,2024-01-01,M,Betty

How would I transform it into a format,

id,school,name1,name2
1,U,Doug,Chris
1,K,Liam,''
3,M,Nancy,Betty

I.e. I want to group by IDs and Schools, and create new columns for each unique Name.

Here is the code to get started.

import pandas as pd

d = {
    'id': [1,1,1,1,2,3,3],
    'date': ['2019-01-01', '2021-01-01', '2022-01-01', '2023-01-01', '2022-01-01', '2024-01-01', '2024-01-01'],
    'school': ['U','U','U','U','K','M','M'],
    'name': ['Doug','Chris','Chris','Chris','Liam','Nancy','Betty']
}
df = pd.DataFrame(d)

Solution

  • You could use this code to get the unique names by id and school. Then move each element in each row's list into a separate column and finally rename the columns.

    df_grouped = df.groupby(['id', 'school'])['name'].unique().apply(pd.Series) df_grouped = df_grouped.rename(columns={x: f"name{x+1}" for x in df_grouped.columns}) df_grouped

    Output:

    id school name1 name2
    1 U Doug Chris
    2 K Liam NaN
    3 M Nancy Betty

    Note that our outputs don’t match exactly because there is a slight mismatch in id for school K between your code and the example you provided.