I am trying to combine the string in column text_info in one row only if index is consecutive. The data I have looks very similar to the below table:
index | text_info |
---|---|
0.0 | word 1 |
NAN | NAN |
3.0 | word2 |
0.0 | word3 |
1.0 | word4 |
2.0 | word5 |
4.0 | word6 |
I would like to combine the text in rows 0,1 and 2 in one row to look like this:
index | text_info |
---|---|
0.0 | word 1 |
NAN | NAN |
3.0 | word2 |
0.0 | word3, word4, word5 |
4.0 | word6 |
The data contains similar consecutive indexes.
I tried multiple solutions including the answer in this question but it did not work. I also tried multi = df.groupby('index',dropna=False)["text_info"].sum()
but it combined all the indexes in consecutive order.
Is there a way to do this?
Try this. pd.to_numeric()
is used in case your NAN
values are strings and not np.NaN
values.
(df.astype(str)
.groupby(pd.to_numeric(df['index'],errors='coerce').diff().ne(1).cumsum(),as_index=False)
.agg({'index':'first','text_info':', '.join}))
Output:
index text_info
0 0.0 word 1
1 nan nan
2 3.0 word2
3 0.0 word3, word4, word5
4 4.0 word6