I have a very wide dataframe where the first column is an index and all subsequent column names I want to become indices too in the melted output. The dataframes have lots of columns so I need a recursive way to do the below on N columns. (As a newbie, this is over my head).
My dataframe is a beginner version of this. I am assuming it is Melt with some complex scope logic, but I am drowning in my confusion right now.
import pandas as pd
df = pd.DataFrame({"GIS": [1, 2, 3], "X0": [100, 200, 300], "X1": [50, 51, 52], "X2": [71, 72, 73], "Xn": [100, 150, 210]})
df = df.set_index(["GIS"])
It produces a df that looks like this:
I don't know how to get there programmatically, especially on really wide dataframes, but ultimately I want it to produce a df that looks like this:
new_df = pd.DataFrame({"GIS": [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], "Variable": ["X0", "X1", "X2", "Xn", "X0", "X1", "X2", "Xn", "X0", "X1", "X2", "Xn"], "Value": [100, 50, 71, 100, 200, 51, 72, 150, 300, 52, 73, 200]})
new_df = new_df.set_index(["GIS", "Variable"])
Which should produce a df structured like this.
Your suggestions for a bigger are greatly appreciated.
Use DataFrame.rename_axis
, DataFrame.stack
and Series.to_frame
:
new_df = df.rename_axis('Variable', axis=1).stack().to_frame('Value')
print (new_df)
Value
GIS Variable
1 X0 100
X1 50
X2 71
Xn 100
2 X0 200
X1 51
X2 72
Xn 150
3 X0 300
X1 52
X2 73
Xn 210
Or DataFrame.melt
with DataFrame.sort_index
and DataFrame.set_index
with append=True
for MultiIndex
:
new_df = (df.melt(ignore_index=False, var_name='Variable', value_name='Value')
.sort_index()
.set_index('Variable', append=True))
print (new_df)
Value
GIS Variable
1 X0 100
X1 50
X2 71
Xn 100
2 X0 200
X1 51
X2 72
Xn 150
3 X0 300
X1 52
X2 73
Xn 210