I have the following problem :
df
Key1 Key2 Value1 Value2 FixedValue
A A 12 32 15
A A 40 25 15
A A 13 12 15
A A 80 100 15
B A 0 1 20
B A 0 12 20
A B 50 50 40
B B 7 8 30
What I want is to create a new Dataframe, with only one line for each (Key1, Key2) couple, but creating new columns to keep the different values taken by Value1 and Value2 (see Output Example to understand better). FixedValue directly depends to (Key1, Key2) so won't change in time. I'd like to limit to a certain number of new columns created, so my output doesn't explode
Output wanted if I limit number of "new column by Value" to 3 :
Key1 Key2 Value1_1 Value1_2 Value1_3 Value2_1 Value2_2 Value2_3 FixedValue
A A 12 40 13 32 25 12 15
B A 0 0 1 12 20
A B 50 50 40
B B 7 8 30
I don't mind the type of the blank going to non-existant values (they can be NaN, '', ... whatever)
Thanks in advance for your help
Use a pivot
after filtering the number of rows to keep per group:
N = 3 # rows to keep per group
out = (df
.assign(col=lambda d: d.groupby(['Key1', 'Key2']).cumcount().add(1))
.query('col <= @N')
.pivot(index=['Key1', 'Key2', 'FixedValue'],
columns='col', values=['Value1', 'Value2']
)
)
out.columns = out.columns.map(lambda x: f'{x[0]}_{x[1]}')
out = out.reset_index()
Output:
Key1 Key2 FixedValue Value1_1 Value1_2 Value1_3 Value2_1 Value2_2 Value2_3
0 A A 15 12.0 40.0 13.0 32.0 25.0 12.0
1 A B 40 50.0 NaN NaN 50.0 NaN NaN
2 B A 20 0.0 0.0 NaN 1.0 12.0 NaN
3 B B 30 7.0 NaN NaN 8.0 NaN NaN