I have a dataframe with shape 500x200 and I'd like to pivot/melt it based on a subset of the columns. Here is an example test dataframe where I have and id
column, three case
columns, and an additional column with data for each id.
pd.DataFrame({'id': [1,2], 'case1': [3,1], 'case2': [3,2], 'case3': [3,2], 'vpd': [2,1]})
id case1 case2 case3 vpd
0 1 3 3 3 2
1 2 1 2 2 1
I'd like to pivot on the case columns only, like so:
pd.DataFrame({'index': ['case1', 'case2', 'case3', 'case1', 'case2', 'case3'], 'id': [1,1,1,2,2,2], 'vpd': [2,2,2,1,1,1],
'case': [3,3,3,1,2,2]}).set_index('index')
id vpd case
index
case1 1 2 3
case2 1 2 3
case3 1 2 3
case1 2 1 1
case2 2 1 2
case3 2 1 2
Where each case column becomes a row in the pivoted dataframe. This sort of seems to get at what I want:
pd.wide_to_long(test_df, "case", i="id", j="case#").reset_index()
id case# vpd case
0 1 1 2 3
1 2 1 1 1
2 1 2 2 3
3 2 2 1 2
4 1 3 2 3
5 2 3 1 2
But not exactly. Any other ideas for how to get to my desired output?
You can use melt
:
>>> (df.melt(['id', 'vpd'], var_name='case#', value_name='case', ignore_index=False)
.set_index('case#'))
id vpd case
case#
case1 1 2 3
case1 2 1 1
case2 1 2 3
case2 2 1 2
case3 1 2 3
case3 2 1 2
To keep order, use stack
:
>>> (df.set_index(['id', 'vpd']).stack()
.rename('case').reset_index(['id', 'vpd']))
id vpd case
case1 1 2 3
case2 1 2 3
case3 1 2 3
case1 2 1 1
case2 2 1 2
case3 2 1 2