Search code examples
python-3.xpandasdataframetime-seriescycle

How can I create a day-number for a 4-year cycle in a pandas dataframe?


I have timeseries data with datetimeindex, price and the cycle phase. My dataframe looks like this:

Dateindex Price CyclePhase Day
1928-01-03 00:00:00 71.04 0 1
1928-01-04 00:00:00 70.88 0 2
1928-01-05 00:00:00 70.2 0 3
1928-01-06 00:00:00 70.64 0 4
1929-05-09 00:00:00 104.08 1 400
1929-05-10 00:00:00 105.36 1 401
1929-05-11 00:00:00 104.96 1 402
1929-05-13 00:00:00 102.56 1 403
1930-11-08 00:00:00 63.56 2 844
1930-11-10 00:00:00 62.16 2 845
1930-11-11 00:00:00 63.16 2 846
1931-12-29 00:00:00 31.84 3 1185
1931-12-30 00:00:00 32.4 3 1186
1931-12-31 00:00:00 32.48 3 1187
1932-01-02 00:00:00 31.28 0 1
1932-01-04 00:00:00 30.24 0 2
1932-01-05 00:00:00 30.2 0 3

I would like to create the column "Day" where it is counting up for each row in the dataframe until the cycle restarts. (when CyclePhase changes from 3 to 0 again).

How can I do that best in python?


Solution

  • Create grouper to flag the rows where cycle changes the calculate cumcount to assign row numbers per group

    s = df['CyclePhase'].diff().lt(0).cumsum()
    df['Day'] = s.groupby(s).cumcount() + 1
    

                  Dateindex   Price  CyclePhase  Day
    0   1928-01-03 00:00:00   71.04           0    1
    1   1928-01-04 00:00:00   70.88           0    2
    2   1928-01-05 00:00:00   70.20           0    3
    3   1928-01-06 00:00:00   70.64           0    4
    4   1929-05-09 00:00:00  104.08           1    5
    5   1929-05-10 00:00:00  105.36           1    6
    6   1929-05-11 00:00:00  104.96           1    7
    7   1929-05-13 00:00:00  102.56           1    8
    8   1930-11-08 00:00:00   63.56           2    9
    9   1930-11-10 00:00:00   62.16           2   10
    10  1930-11-11 00:00:00   63.16           2   11
    11  1931-12-29 00:00:00   31.84           3   12
    12  1931-12-30 00:00:00   32.40           3   13
    13  1931-12-31 00:00:00   32.48           3   14
    14  1932-01-02 00:00:00   31.28           0    1
    15  1932-01-04 00:00:00   30.24           0    2
    16  1932-01-05 00:00:00   30.20           0    3