Search code examples
pythondatabasepandasdataframedatabase-normalization

Take multiple columns and put them to the same index with pandas


I have a bunch of ugly customer data that I am trying to normalise. It basically looks like this:

Customer   Order1   Order2   Order3 ... OrderN

  John      This     That     The   ...  Other
 Shelly    Thing1   Thing2   Thing3 ... ThingN
   .         .        .        .          .
   .         .        .        .          .

So I want to change it to look like:

Customer   Order

 John      This
 John      That
 John      The
 Shelly    Thing1
 Shelly    Thing2

And so on.

I am not sure how to do this, though.

Any help would be great!


Solution

  • Precisely one stack and two reset_index calls.

    df
      Customer  Order1  Order2  Order3  OrderN
    0     John    This    That     The   Other
    1   Shelly  Thing1  Thing2  Thing3  ThingN
    

    (df.set_index('Customer')
       .stack()
       .reset_index(level=1, drop=True)
       .reset_index(name='Order')
    )
    
      Customer   Order
    0     John    This
    1     John    That
    2     John     The
    3     John   Other
    4   Shelly  Thing1
    5   Shelly  Thing2
    6   Shelly  Thing3
    7   Shelly  ThingN