python python-3.x pandas dataframe group-by

combine rows if consecutive index exist

I am trying to combine the string in column text_info in one row only if index is consecutive. The data I have looks very similar to the below table:

index	text_info
0.0	word 1
NAN	NAN
3.0	word2
0.0	word3
1.0	word4
2.0	word5
4.0	word6

I would like to combine the text in rows 0,1 and 2 in one row to look like this:

index	text_info
0.0	word 1
NAN	NAN
3.0	word2
0.0	word3, word4, word5
4.0	word6

The data contains similar consecutive indexes.

I tried multiple solutions including the answer in this question but it did not work. I also tried multi = df.groupby('index',dropna=False)["text_info"].sum() but it combined all the indexes in consecutive order.

Is there a way to do this?

Solution

Try this. pd.to_numeric() is used in case your NAN values are strings and not np.NaN values.

(df.astype(str)
 .groupby(pd.to_numeric(df['index'],errors='coerce').diff().ne(1).cumsum(),as_index=False)
    .agg({'index':'first','text_info':', '.join}))

Output:

  index            text_info
0   0.0               word 1
1   nan                  nan
2   3.0                word2
3   0.0  word3, word4, word5
4   4.0                word6