Search code examples
pythonpython-3.xpandasdataframegroup-by

combine rows if consecutive index exist


I am trying to combine the string in column text_info in one row only if index is consecutive. The data I have looks very similar to the below table:

index text_info
0.0 word 1
NAN NAN
3.0 word2
0.0 word3
1.0 word4
2.0 word5
4.0 word6

I would like to combine the text in rows 0,1 and 2 in one row to look like this:

index text_info
0.0 word 1
NAN NAN
3.0 word2
0.0 word3, word4, word5
4.0 word6

The data contains similar consecutive indexes.

I tried multiple solutions including the answer in this question but it did not work. I also tried multi = df.groupby('index',dropna=False)["text_info"].sum() but it combined all the indexes in consecutive order.

Is there a way to do this?


Solution

  • Try this. pd.to_numeric() is used in case your NAN values are strings and not np.NaN values.

    (df.astype(str)
     .groupby(pd.to_numeric(df['index'],errors='coerce').diff().ne(1).cumsum(),as_index=False)
        .agg({'index':'first','text_info':', '.join}))
    

    Output:

      index            text_info
    0   0.0               word 1
    1   nan                  nan
    2   3.0                word2
    3   0.0  word3, word4, word5
    4   4.0                word6