Search code examples
pandasdataframemelt

collapse pandas dataframe rows based on index column


I have a dataframe that contains information that is linked by an ID column. The rows are sequential with the odd rows containing a "start-point" and the even rows containing an "end" point. My goal is to collapse the data from these into a single row with columns for "start" and "end" following each other. The rows do have a "packet ID" that would link them if the sequential nature of the dataframe is not consistent.

example:

    df:
        0   1           2           3       4   5
    0   hs6 106956570   106956648   ID_A1   60  -
    1   hs1 153649721   153649769   ID_A1   60  -
    2   hs1 865130744   865130819   ID_A2   0   -
    3   hs7 21882206    21882237    ID_A2   0   -
    4   hs1 74230744    74230819    ID_A3   0   +
    5   hs8 92041314    92041508    ID_A3   0   + 

The resulting dataframe that I am trying to achieve is:

    new_df
        0   1           2           3   4           5
    0   hs6 106956570   106956648   hs1 153649721   153649769
    1   hs1 865130744   865130819   hs7 21882206    21882237
    2   hs1 74230744    74230819    hs8 92041314    92041508

with each row containing the information on both the start and the end-point.

I have tried to pass the IDs in to an array and use a for loop to pull the information out of the original dataframe into a new dataframe but this has not worked. I was looking at the melt documentation which would suggest that pd.melt(df, id_vars=[3], value_vars=[0,1,2]) may work but I cannot see how to get the corresponding row in to positions new_df[3,4,5].

I think that it may be something really simple that I am missing but any suggestions would be appreciated.


Solution

  • You can try this:

    df_out = df.set_index([df.index%2, df.index//2])[df.columns[:3]]\
               .unstack(0).sort_index(level=1, axis=1)
    df_out.columns = np.arange(len(df_out.columns))
    df_out
    

    Output:

         0          1          2    3          4          5
    0  hs6  106956570  106956648  hs1  153649721  153649769
    1  hs1  865130744  865130819  hs7   21882206   21882237
    2  hs1   74230744   74230819  hs8   92041314   92041508