Search code examples
pandaspandas-melt

Pandas - recursively melting many columns while making column names indices


I have a very wide dataframe where the first column is an index and all subsequent column names I want to become indices too in the melted output. The dataframes have lots of columns so I need a recursive way to do the below on N columns. (As a newbie, this is over my head).

My dataframe is a beginner version of this. I am assuming it is Melt with some complex scope logic, but I am drowning in my confusion right now.

import pandas as pd

df = pd.DataFrame({"GIS": [1, 2, 3], "X0": [100, 200, 300], "X1": [50, 51, 52], "X2": [71, 72, 73], "Xn": [100, 150, 210]})
df = df.set_index(["GIS"])

It produces a df that looks like this: enter image description here

I don't know how to get there programmatically, especially on really wide dataframes, but ultimately I want it to produce a df that looks like this:

new_df = pd.DataFrame({"GIS": [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], "Variable": ["X0", "X1", "X2", "Xn", "X0", "X1", "X2", "Xn", "X0", "X1", "X2", "Xn"], "Value": [100, 50, 71, 100, 200, 51, 72, 150, 300, 52, 73, 200]})
new_df = new_df.set_index(["GIS", "Variable"])

Which should produce a df structured like this. enter image description here

Your suggestions for a bigger are greatly appreciated.


Solution

  • Use DataFrame.rename_axis, DataFrame.stack and Series.to_frame:

    new_df = df.rename_axis('Variable', axis=1).stack().to_frame('Value')
    print (new_df)
                  Value
    GIS Variable       
    1   X0          100
        X1           50
        X2           71
        Xn          100
    2   X0          200
        X1           51
        X2           72
        Xn          150
    3   X0          300
        X1           52
        X2           73
        Xn          210
    

    Or DataFrame.melt with DataFrame.sort_index and DataFrame.set_index with append=True for MultiIndex:

    new_df = (df.melt(ignore_index=False, var_name='Variable', value_name='Value')
                .sort_index()
                .set_index('Variable', append=True))
    print (new_df)
                  Value
    GIS Variable       
    1   X0          100
        X1           50
        X2           71
        Xn          100
    2   X0          200
        X1           51
        X2           72
        Xn          150
    3   X0          300
        X1           52
        X2           73
        Xn          210