Search code examples
pythonpandasdataframegroup-bypivot

How to make pandas dataframe long to wide


Here is the input dataframe. Case 1:

df = pd.DataFrame({'order': {0: '1',
  1: '1',
  2: '2',
  3: '2',
  4: '3',
  5: '3'},
 'start': {0: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
  1: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  2: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
  3: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  4: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
  5: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC')},
 'end': {0: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  1: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC'),
  2: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  3: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC'),
  4: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  5: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC')},
 'quant': {0: 10, 1: 10, 2: 20, 3: 30, 4: 40, 5: 50},
 'price': {0: 44, 1: 44, 2: 5, 3: 6, 4: 8, 5: 8}})

input df image - >input df image

My requirement is to wide this dataframe based on order , So my expected output is expected df

Case 2:

`
  df = pd.DataFrame({'order': {0: '1',
  1: '1',
  2: '2',
  3: '2',
  4: '3',
  5: '3',
  6: '3',
  7: '3'},
 'start': {0: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
  1: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  2: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
  3: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  4: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
  5: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  6: pd.Timestamp('2023-03-01 04:00:00+0000', tz='UTC'),
  7: pd.Timestamp('2023-02-01 04:00:00+0000', tz='UTC')},
 'end': {0: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  1: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC'),
  2: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  3: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC'),
  4: pd.Timestamp('2023-05-01 04:00:00+0000', tz='UTC'),
  5: pd.Timestamp('2023-06-01 04:00:00+0000', tz='UTC'),
  6: pd.Timestamp('2023-04-01 04:00:00+0000', tz='UTC'),
  7: pd.Timestamp('2023-03-01 04:00:00+0000', tz='UTC')},
 'quant': {0: 10, 1: 10, 2: 20, 3: 30, 4: 40, 5: 50, 6:10, 7:10},
 'price': {0: 44, 1: 44, 2: 5, 3: 6, 4: 8, 5: 8, 6:9, 7:8}})`

In this case we have two more rows for order 3. So resulting dataframe quant & price should be Nan for order 1 & 2 for extra rows of order 3.

input frame -> input df image

expected output is, expected output

Can anyone please help me with this ?


Solution

  • This is a pivot, you can then reorder the levels:

    out = (df.pivot(index=['start', 'end'], columns='order')
             .sort_index(axis=1, level=1, sort_remaining=False)
             .swaplevel(axis=1)
          )
    

    Output:

    order                                                   1           2           3      
                                                        quant price quant price quant price
    start                     end                                                          
    2023-04-01 04:00:00+00:00 2023-05-01 04:00:00+00:00    10    44    20     5    40     8
    2023-05-01 04:00:00+00:00 2023-06-01 04:00:00+00:00    10    44    30     6    50     8