Search code examples
pythonpython-3.xpandastime-seriesenumerate

Enumerating dataframe based on a column


I am dealing with a time-series dataframe that looks like this except with more than thousands of rows. I want to make a new column that enumerates the blocks of rows with the same values of 'sign'. i.e. 0th row would be 0, 1st row to 23rd row would be 1, 24th row to 30th row would be 2 etc...(the chronological order is important) What's the most pythonic way to accomplish this? Thank you in advance

    Date       sign
0   2011-01-27  1
1   2011-01-28  -1
2   2011-01-31  -1
3   2011-02-01  -1
4   2011-02-02  -1
5   2011-02-07  -1
6   2011-02-08  -1
7   2011-02-09  -1
8   2011-02-10  -1
9   2011-02-11  -1
10  2011-02-14  -1
11  2011-02-15  -1
12  2011-02-16  -1
13  2011-02-17  -1
14  2011-02-18  -1
15  2011-02-21  -1
16  2011-02-22  -1
17  2011-02-23  -1
18  2011-02-24  -1
19  2011-02-25  -1
20  2011-02-28  -1
21  2011-03-01  -1
22  2011-03-02  -1
23  2011-03-03  -1
24  2011-03-04  1
25  2011-03-07  1
26  2011-03-08  1
27  2011-03-09  1
28  2011-03-10  1
29  2011-03-11  1
30  2011-03-14  1
31  2011-03-15  -1
32  2011-03-16  -1
33  2011-03-17  -1
34  2011-03-18  -1
35  2011-03-21  -1
36  2011-03-22  -1
37  2011-03-23  -1
38  2011-03-24  -1
39  2011-03-25  -1
40  2011-03-28  -1
41  2011-03-29  1
42  2011-03-30  1

Solution

  • You can get the cumsum of where the sign changes, obtained using diff:

    df['new_column'] = (df.sign.diff()!=0).cumsum()-1
    
    >>> df
              Date  sign  new_column
    0   2011-01-27     1      0
    1   2011-01-28    -1      1
    2   2011-01-31    -1      1
    3   2011-02-01    -1      1
    4   2011-02-02    -1      1
    5   2011-02-07    -1      1
    6   2011-02-08    -1      1
    7   2011-02-09    -1      1
    8   2011-02-10    -1      1
    9   2011-02-11    -1      1
    10  2011-02-14    -1      1
    11  2011-02-15    -1      1
    12  2011-02-16    -1      1
    13  2011-02-17    -1      1
    14  2011-02-18    -1      1
    15  2011-02-21    -1      1
    16  2011-02-22    -1      1
    17  2011-02-23    -1      1
    18  2011-02-24    -1      1
    19  2011-02-25    -1      1
    20  2011-02-28    -1      1
    21  2011-03-01    -1      1
    22  2011-03-02    -1      1
    23  2011-03-03    -1      1
    24  2011-03-04     1      2
    25  2011-03-07     1      2
    26  2011-03-08     1      2
    27  2011-03-09     1      2
    28  2011-03-10     1      2
    29  2011-03-11     1      2
    30  2011-03-14     1      2
    31  2011-03-15    -1      3
    32  2011-03-16    -1      3
    33  2011-03-17    -1      3
    34  2011-03-18    -1      3
    35  2011-03-21    -1      3
    36  2011-03-22    -1      3
    37  2011-03-23    -1      3
    38  2011-03-24    -1      3
    39  2011-03-25    -1      3
    40  2011-03-28    -1      3
    41  2011-03-29     1      4
    42  2011-03-30     1      4