Search code examples
pandaspivotone-hot-encoding

Pandas: get dummies and keep other column as values


I have a dataframe like this:

Sequence Message
1 A
2 B
3 C
4 D
1 A
2 C
3 D
4 E
5 F
1 A
2 C
3 D

And I'm trying to pivot it to have table with maximum Sequence length as columns and corresponding possible Message as values.

1 2 3 4 5 6 7
A B C D
A C D E F
A C D

I'm trying to use pd.get_dummies() for Sequence , but can't figure out how to keep Message values.

Appreciate any help.


Solution

  • Use a pivot with df['Sequence'].diff().le(0).cumsum() as index (new row when the Sequence restarts to an earlier number):

    (df.assign(index=df['Sequence'].diff().le(0).cumsum())
       .pivot(index='index', columns='Sequence', values='Message')
       .rename_axis(index=None, columns=None) # optional
    )
    

    Output:

       1  2  3    4    5
    0  A  B  C    D  NaN
    1  A  C  D    E    F
    2  A  C  D  NaN  NaN