I have a pandas dataframe in the rough format
print(df)
Time GroupA GroupB Value1 Value2
0 100.0 1.0 1.0 18.0 0.0
1 100.0 1.0 2.0 16.0 0.0
2 100.0 2.0 1.0 18.0 0.0
3 100.0 2.0 2.0 10.0 0.0
where Time
is a count variable / timestamp, GroupA
and GroupB
are categories, and Value1
and Value2
are numerical quantities. This code snippet creates a mockup dataframe:
import numpy as np
values = np.zeros(shape=(4,5))
values[:,0] = 100
values[:,1] = [1]*2 + [2]*2
values[:,2] = [1,2]*2
values[:,3] = np.random.randint(low=10,high=20,size=(4))
df = pd.DataFrame(values,columns=['Time','GroupA','GroupB','Value1','Value2'])
After loading in some data, I want to calculate and fill in values of Value2
. As it happens (since, incidentally, Value2
is a time series function of Value1
within each existing (GroupA
, GroupB
) pair), I found it easiest to calculate these values by first pivoting my data into the form:
df_pivot = df.pivot_table(index='Time',columns=['GroupA','GroupB'],values=['Value1','Value2'], fill_value=0.0)
Then after some unrelated code I have filled in values
print(df_pivot)
Value1 Value2
GroupA 1.0 2.0 1.0 2.0
GroupB 1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0
Time
100.0 13 16 16 10 27 20 28 20
Now I want to "unpivot" this back to the original format of df
. I could do this manually by looping over df
, looking up the value in df_pivot
, and filling it, but I'd prefer to use built-in functions. Try as I might using variations of df.melt
, I cannot perform this inversion, because of problems with df_pivot
's hierarchical columns. My best attempt is
dfm = df_pivot.reset_index().melt(id_vars="Time")
dfm.columns.values[1] = "HACK"
dfm = dfm.pivot_table(index=["Time","GroupA","GroupB"],columns="HACK",values="value").reset_index()
which produces the data frame
print(dfm)
HACK Time GroupA GroupB Value1 Value2
0 100.0 1.0 1.0 13 27
1 100.0 1.0 2.0 16 20
2 100.0 2.0 1.0 16 28
3 100.0 2.0 2.0 10 20
This works, but doesn't strike me as be best solution, or very portable (why does melt produce a "NaN" column name? why do I have a manually find the index of this column and rename it? why do I have to pivot to undo a pivot?) Experimenting and looking through documentation and examples for an alternative, I'm at a loss, though. The melt
function has a col_level
argument that looks like it should help, but any valid value I use for this just leads to data loss (losing the "Time", "GroupA", or "GroupB" data).
I think stack
is more straightforward
df_pivot.stack([1,2]).reset_index()
Out[8]:
Time GroupA GroupB Value1 Value2
0 100.0 1.0 1.0 13 0
1 100.0 1.0 2.0 13 0
2 100.0 2.0 1.0 12 0
3 100.0 2.0 2.0 11 0