Say I have a pandas dataframe like,
id,date,school,name
1,2019-01-01,U,Doug
1,2021-01-01,U,Chris
1,2022-01-01,U,Chris
1,2023-01-01,U,Chris
1,2022-01-01,K,Liam
3,2024-01-01,M,Nancy
3,2024-01-01,M,Betty
How would I transform it into a format,
id,school,name1,name2
1,U,Doug,Chris
1,K,Liam,''
3,M,Nancy,Betty
I.e. I want to group by IDs and Schools, and create new columns for each unique Name.
Here is the code to get started.
import pandas as pd
d = {
'id': [1,1,1,1,2,3,3],
'date': ['2019-01-01', '2021-01-01', '2022-01-01', '2023-01-01', '2022-01-01', '2024-01-01', '2024-01-01'],
'school': ['U','U','U','U','K','M','M'],
'name': ['Doug','Chris','Chris','Chris','Liam','Nancy','Betty']
}
df = pd.DataFrame(d)
You could use this code to get the unique names by id and school. Then move each element in each row's list into a separate column and finally rename the columns.
df_grouped = df.groupby(['id', 'school'])['name'].unique().apply(pd.Series) df_grouped = df_grouped.rename(columns={x: f"name{x+1}" for x in df_grouped.columns}) df_grouped
Output:
id | school | name1 | name2 |
---|---|---|---|
1 | U | Doug | Chris |
2 | K | Liam | NaN |
3 | M | Nancy | Betty |
Note that our outputs don’t match exactly because there is a slight mismatch in id for school K between your code and the example you provided.