I have a simple dataframe
{'ID': {0: 101, 1: 101, 2: 101, 3: 102, 4: 102, 5: 102, 6: 102, 7: 102, 8: 103, 9: 103}, 'Category': {0: 'A', 1: 'B', 2: 'C', 3: 'A', 4: 'A', 5: 'A', 6: 'B', 7: 'B', 8: 'A', 9: 'B'}}
You can see that ID has duplicates and gives me a reshaping error.
I want to convert if from wide to long and keep values in the Category column as a new column (I will have max 5 categories per ID, but some will have less than 5 and should have NaN
. Plus want to rename columns to Role1
, Role2
etc.
Expected output
ID Role1 Role2 Role3 Role4 Role5
101 A B C NaN NaN
102 A A A B B
103 A B NaN NaN NaN
You can assign an enumeration within each ID and query that before pivoting:
N = 5
(df.assign(role_enum=df.groupby('ID').cumcount()+1)
.query('role_enum<=@N')
.pivot(index='ID', columns='role_enum', values='Category')
.add_prefix('Role').reset_index() # book-keeping
)
Output:
role_enum ID Role1 Role2 Role3 Role4 Role5
0 101 A B C NaN NaN
1 102 A A A B B
2 103 A B NaN NaN NaN