Search code examples
pythonpandaspivot-table

Pandas Pivot / Rows to Columns Join


I will always be working with < 1000 lines of data, not too concerned with data performance

The transformation will would never exceed OP#5

Please advise the best way to do this in Pandas or pure python.

I attempted to do this with dataframe.pivot_table. The issue lies in the index repeats itself, and then it will simply create rows for each CAT rather than simply sticking the 1st CAT based on the index into OP#1 and then the 2nd CAT based on the duplicate index into OP#2, etc.

Data before transform

ID MISC CAT TIME
id1 123 CAT A VALUE
id1 123 CAT B VALUE
id1 123 CAT C VALUE
id1 123 CAT C VALUE
id1 123 CAT E VALUE
id2 145 CAT A VALUE
id2 145 CAT B VALUE
id3 101 CAT D VALUE
id3 101 CAT E VALUE
id4 500 CAT D VALUE
id5 600 CAT E VALUE

Data after transform.

ID MISC OP#1 TIME OP#2 TIME OP#3 TIME OP#4 TIME OP#5 TIME
id1 123 CAT A VALUE CAT B VALUE CAT C VALUE CAT D VALUE CAT E VALUE
id2 145 CAT A VALUE CAT B VALUE
id3 101 CAT E VALUE CAT E VALUE
id4 500 CAT D VALUE
id5 600 CAT E VALUE

Solution

  • Code

    out = (df.assign(cc=df.groupby('ID').cumcount().add(1))
           .pivot(index=['ID', 'MISC'], columns='cc')
           .sort_index(axis=1, level=1)
           .pipe(lambda x: x.set_axis(
               x.columns.map(lambda x: f'OP#{x[1]}' if x[0] == 'CAT' else 'TIME'), 
               axis=1))
           .reset_index()
    )
    

    out

    enter image description here