Search code examples
pythonpandasdataframejupyter-notebookpivot

Pivot a data frame in pandas


So i have a data frame

data = {
    'Type': ['Z', 'x', 'A', 'A', 'A', 'D', 'B', 'B','Z', 'x', 'A', 'A', 'A', 'D', 'B', 'B'],
    'Val': ['012', '11', '123', '567', '101', '22', 'T54', '111','012', '11', '123', '500', '101', '22', 'T54', '111']
}

df = pd.DataFrame(data)

original data frame (df)

I simply want to pivot this data frame maintaining the original order of the columns

so output will look like this

Z x A D B
012 11 123 22 T54
nan nan 567 nan 111
nan nan 101 nan nan
012 11 123 22 T54
nan nan 500 nan 111
nan nan 101 nan nan

note that the order of the columns should be maintained and whichever columns are repeating its value will keep getting added to new rows.

I have tried this code but unfortunately not getting the desired output

df['Type'] = pd.Categorical(df['Type'], categories=df['Type'].unique(), ordered=True)
df['Row'] = df.groupby('Type').cumcount()
df = df.pivot(index='Row', columns='Type', values='Val')
df = df.reset_index(drop=True)
df.columns.name = None

my output


Solution

  • You need to use a double index, once to determine when the values restart to the beginning (taking advantage of your Categorical and of shift+cumsum), and once with the groupby.cumcount:

    df['Type'] = pd.Categorical(df['Type'], categories=df['Type'].unique(),
                                ordered=True)
    
    out = (df
       .assign(idx=lambda d: d['Type'].lt(d['Type'].shift()).cumsum(),
               idx2=lambda d: d.groupby(['idx', 'Type']).cumcount(),
              )
       .pivot(index=['idx', 'idx2'], columns='Type', values='Val')
    )
    

    Output:

    Type        Z    x    A    D    B
    idx idx2                         
    0   0     012   11  123   22  T54
        1     NaN  NaN  567  NaN  111
        2     NaN  NaN  101  NaN  NaN
    1   0     012   11  123   22  T54
        1     NaN  NaN  500  NaN  111
        2     NaN  NaN  101  NaN  NaN