Search code examples
pythonpandaspivotmelt

Exact inverse of pandas' "pivot" operation


I have a pandas dataframe in the rough format

print(df)
    Time  GroupA  GroupB  Value1  Value2
0  100.0     1.0     1.0    18.0     0.0
1  100.0     1.0     2.0    16.0     0.0
2  100.0     2.0     1.0    18.0     0.0
3  100.0     2.0     2.0    10.0     0.0

where Time is a count variable / timestamp, GroupA and GroupB are categories, and Value1 and Value2 are numerical quantities. This code snippet creates a mockup dataframe:

import numpy as np
values = np.zeros(shape=(4,5))
values[:,0] = 100
values[:,1] = [1]*2 + [2]*2
values[:,2] = [1,2]*2
values[:,3] = np.random.randint(low=10,high=20,size=(4))
df = pd.DataFrame(values,columns=['Time','GroupA','GroupB','Value1','Value2'])

After loading in some data, I want to calculate and fill in values of Value2. As it happens (since, incidentally, Value2 is a time series function of Value1 within each existing (GroupA, GroupB) pair), I found it easiest to calculate these values by first pivoting my data into the form:

df_pivot = df.pivot_table(index='Time',columns=['GroupA','GroupB'],values=['Value1','Value2'], fill_value=0.0)

Then after some unrelated code I have filled in values

print(df_pivot)
       Value1             Value2            
GroupA    1.0     2.0        1.0     2.0    
GroupB    1.0 2.0 1.0 2.0    1.0 2.0 1.0 2.0
Time                                        
100.0      13  16  16  10     27  20  28  20

Now I want to "unpivot" this back to the original format of df. I could do this manually by looping over df, looking up the value in df_pivot, and filling it, but I'd prefer to use built-in functions. Try as I might using variations of df.melt, I cannot perform this inversion, because of problems with df_pivot's hierarchical columns. My best attempt is

dfm = df_pivot.reset_index().melt(id_vars="Time")
dfm.columns.values[1] = "HACK"
dfm = dfm.pivot_table(index=["Time","GroupA","GroupB"],columns="HACK",values="value").reset_index()

which produces the data frame

print(dfm)
HACK   Time  GroupA  GroupB  Value1  Value2
0     100.0     1.0     1.0      13      27
1     100.0     1.0     2.0      16      20
2     100.0     2.0     1.0      16      28
3     100.0     2.0     2.0      10      20

This works, but doesn't strike me as be best solution, or very portable (why does melt produce a "NaN" column name? why do I have a manually find the index of this column and rename it? why do I have to pivot to undo a pivot?) Experimenting and looking through documentation and examples for an alternative, I'm at a loss, though. The melt function has a col_level argument that looks like it should help, but any valid value I use for this just leads to data loss (losing the "Time", "GroupA", or "GroupB" data).


Solution

  • I think stack is more straightforward

    df_pivot.stack([1,2]).reset_index()
    Out[8]: 
        Time  GroupA  GroupB  Value1  Value2
    0  100.0     1.0     1.0      13       0
    1  100.0     1.0     2.0      13       0
    2  100.0     2.0     1.0      12       0
    3  100.0     2.0     2.0      11       0