I am dealing with a time-series dataframe that looks like this except with more than thousands of rows. I want to make a new column that enumerates the blocks of rows with the same values of 'sign'. i.e. 0th row would be 0, 1st row to 23rd row would be 1, 24th row to 30th row would be 2 etc...(the chronological order is important) What's the most pythonic way to accomplish this? Thank you in advance
Date sign
0 2011-01-27 1
1 2011-01-28 -1
2 2011-01-31 -1
3 2011-02-01 -1
4 2011-02-02 -1
5 2011-02-07 -1
6 2011-02-08 -1
7 2011-02-09 -1
8 2011-02-10 -1
9 2011-02-11 -1
10 2011-02-14 -1
11 2011-02-15 -1
12 2011-02-16 -1
13 2011-02-17 -1
14 2011-02-18 -1
15 2011-02-21 -1
16 2011-02-22 -1
17 2011-02-23 -1
18 2011-02-24 -1
19 2011-02-25 -1
20 2011-02-28 -1
21 2011-03-01 -1
22 2011-03-02 -1
23 2011-03-03 -1
24 2011-03-04 1
25 2011-03-07 1
26 2011-03-08 1
27 2011-03-09 1
28 2011-03-10 1
29 2011-03-11 1
30 2011-03-14 1
31 2011-03-15 -1
32 2011-03-16 -1
33 2011-03-17 -1
34 2011-03-18 -1
35 2011-03-21 -1
36 2011-03-22 -1
37 2011-03-23 -1
38 2011-03-24 -1
39 2011-03-25 -1
40 2011-03-28 -1
41 2011-03-29 1
42 2011-03-30 1
You can get the cumsum
of where the sign changes, obtained using diff
:
df['new_column'] = (df.sign.diff()!=0).cumsum()-1
>>> df
Date sign new_column
0 2011-01-27 1 0
1 2011-01-28 -1 1
2 2011-01-31 -1 1
3 2011-02-01 -1 1
4 2011-02-02 -1 1
5 2011-02-07 -1 1
6 2011-02-08 -1 1
7 2011-02-09 -1 1
8 2011-02-10 -1 1
9 2011-02-11 -1 1
10 2011-02-14 -1 1
11 2011-02-15 -1 1
12 2011-02-16 -1 1
13 2011-02-17 -1 1
14 2011-02-18 -1 1
15 2011-02-21 -1 1
16 2011-02-22 -1 1
17 2011-02-23 -1 1
18 2011-02-24 -1 1
19 2011-02-25 -1 1
20 2011-02-28 -1 1
21 2011-03-01 -1 1
22 2011-03-02 -1 1
23 2011-03-03 -1 1
24 2011-03-04 1 2
25 2011-03-07 1 2
26 2011-03-08 1 2
27 2011-03-09 1 2
28 2011-03-10 1 2
29 2011-03-11 1 2
30 2011-03-14 1 2
31 2011-03-15 -1 3
32 2011-03-16 -1 3
33 2011-03-17 -1 3
34 2011-03-18 -1 3
35 2011-03-21 -1 3
36 2011-03-22 -1 3
37 2011-03-23 -1 3
38 2011-03-24 -1 3
39 2011-03-25 -1 3
40 2011-03-28 -1 3
41 2011-03-29 1 4
42 2011-03-30 1 4