Search code examples
pythonpandasenumerate

categorize numerical series with python


I'm figuring out how to assign a categorization from an increasing enumeration column. Here an example of my dataframe:

df = pd.DataFrame({'A':[1,1,1,1,1,1,2,2,3,3,3,3,3],'B':[1,2,3,12,13,14,1,2,5,6,7,8,50]})

This produce:

df
Out[9]: 
    A   B
0   1   1
1   1   2
2   1   3
3   1  12
4   1  13
5   1  14
6   2   1
7   2   2
8   3   5
9   3   6
10  3   7
11  3   8
12  3  50

The column B has an increasing numerical serie, but sometimes the series is interrupted and keeps going with other numbers or start again. My desired output is:

Out[11]: 
    A   B  C
0   1   1  1
1   1   2  1
2   1   3  1
3   1  12  2
4   1  13  2
5   1  14  2
6   2   1  3
7   2   2  3
8   3   5  3
9   3   6  4
10  3   7  4
11  3   8  4
12  3  50  5

I appreciate your suggestions, because I can not find an ingenious way to 

do it. Thanks


Solution

  • Is this what you need ?

    df.B.diff().ne(1).cumsum()
    Out[463]: 
    0     1
    1     1
    2     1
    3     2
    4     2
    5     2
    6     3
    7     3
    8     4
    9     4
    10    4
    11    4
    12    5
    Name: B, dtype: int32