What Python code will get me from Original DataFrame to Final DataFrame? I am struggling to find any pandas functions for this type of a pivot.
Original DataFrame:
test1 test2 class value1
0 1 5 type type1
1 1 5 type type3
2 2 6 type type1
3 3 7 type type2
4 1 5 lat 40
5 1 5 lat 20
6 2 6 lat 50
7 3 7 lat 60
8 1 5 lon 1
9 1 5 lon 2
10 2 6 lon 2
11 3 7 lon 3
Final DataFrame:
test1 test2 type lat lon
0 1 5 type1 40 1
1 2 6 type1 50 2
2 3 7 type2 60 3
3 1 5 type3 20 2
I tired using the pandas pivot function, but get the following error.
df_pivoted = df_orig.pivot(
index=['test1', 'test2'],
columns='class',
values='value1')
ValueError: Index contains duplicate entries, cannot reshape
De-duplicate the rows before pivot
:
cols = ['test1', 'test2']
out = (
df.assign(n=df.groupby(cols+['class']).cumcount())
.pivot(index=['n']+cols, columns='class', values='value1')
.reset_index().rename_axis(columns=None).drop(columns='n')
)
Output:
test1 test2 lat lon type
0 1 5 40 1 type1
1 2 6 50 2 type1
2 3 7 60 3 type2
3 1 5 20 2 type3
This is not at all a simple pivot
, there is no direct relationship between test1/test2 and class/value, you must handle both parts separately and join
:
cols = ['test1', 'test2']
g = df_orig.groupby('class', sort=False)
out = (df_orig[cols]
.assign(n=g.ngroup())
.drop_duplicates('n').set_index('n')
.join((df_orig.drop(columns=cols).assign(n=g.cumcount())
.pivot(index='n', columns='class', values='value1')
))
.reset_index(drop=True)
)
print(out)
Output:
test1 test2 lat lon type
0 1 5 40 1 type1
1 2 6 50 2 type1
2 3 7 60 3 type2
For a simple pivot
your input should look like:
test1 test2 class value1
0 1 5 type type1
1 2 6 type type1
2 3 7 type type2
3 1 5 lat 40
4 2 6 lat 50
5 3 7 lat 60
6 1 5 lon 1
7 2 6 lon 2
8 3 7 lon 3
And you could use:
df.pivot(index=['test1', 'test2'], columns='class', values='value1').reset_index()