Search code examples
pythonpandasdataframepivotmelt

Convert dataframe from wide to long format on multiple columns with differing names


I have a dataframe with shape 500x200 and I'd like to pivot/melt it based on a subset of the columns. Here is an example test dataframe where I have and id column, three case columns, and an additional column with data for each id.

pd.DataFrame({'id': [1,2], 'case1': [3,1], 'case2': [3,2], 'case3': [3,2], 'vpd': [2,1]})

    id  case1 case2 case3 vpd
0   1   3     3     3     2
1   2   1     2     2     1

I'd like to pivot on the case columns only, like so:

pd.DataFrame({'index': ['case1', 'case2', 'case3', 'case1', 'case2', 'case3'], 'id': [1,1,1,2,2,2], 'vpd': [2,2,2,1,1,1],
             'case': [3,3,3,1,2,2]}).set_index('index')
        id vpd case
index           
case1   1   2   3
case2   1   2   3
case3   1   2   3
case1   2   1   1
case2   2   1   2
case3   2   1   2

Where each case column becomes a row in the pivoted dataframe. This sort of seems to get at what I want:

pd.wide_to_long(test_df, "case", i="id", j="case#").reset_index()
    id  case#   vpd case
0   1   1       2   3
1   2   1       1   1
2   1   2       2   3
3   2   2       1   2
4   1   3       2   3
5   2   3       1   2

But not exactly. Any other ideas for how to get to my desired output?


Solution

  • You can use melt:

    >>> (df.melt(['id', 'vpd'], var_name='case#', value_name='case', ignore_index=False)
           .set_index('case#'))
    
           id  vpd  case
    case#               
    case1   1    2     3
    case1   2    1     1
    case2   1    2     3
    case2   2    1     2
    case3   1    2     3
    case3   2    1     2
    

    To keep order, use stack:

    >>> (df.set_index(['id', 'vpd']).stack()
           .rename('case').reset_index(['id', 'vpd']))
    
           id  vpd  case
    case1   1    2     3
    case2   1    2     3
    case3   1    2     3
    case1   2    1     1
    case2   2    1     2
    case3   2    1     2