Search code examples
pythonpandasincrement

pandas: increment based on a condition in another column


I have a dataframe that has one column only like the following.(a minimal example)

import pandas as pd

dataframe =pd.DataFrame({'text': ['##weather','how is today?', 'we go out', '##rain',
                     'my day is rainy', 'I am not feeling well','rainy 
                    blues','##flower','the blue flower', 'she likes red',
                    'this flower is nice']})

I would like to add a second column called 'id' and increment every time the row contains '##'. so my desired output would be,

                    text  id
0              ##weather  100
1          how is today?  100
2              we go out  100
3                 ##rain  101
4        my day is rainy  101
5  I am not feeling well  101
6            rainy blues  101
7                ##flower 102
8         the blue flower 102
9           she likes red 102
10    this flower is nice 102

so far i have done the following which does not return the right output as i want.

dataframe['id']= 100
dataframe.loc[dataframe['text'].str.contains('## intent:'), 'id'] += 1

Solution

  • You can try groupby with ngroup

    m = dataframe['text'].str.contains('##').cumsum()
    
    dataframe['id'] = dataframe.groupby(m).ngroup() + 100
    
    print(dataframe)
    
                         text   id
    0               ##weather  100
    1           how is today?  100
    2               we go out  100
    3                  ##rain  101
    4         my day is rainy  101
    5   I am not feeling well  101
    6                   rainy  101
    7                   blues  101
    8                ##flower  102
    9         the blue flower  102
    10          she likes red  102
    11    this flower is nice  102