Search code examples
pythonpandasstringdataframelabel

How to create a column of incrementing string labels based on values in a column?


I have a dataframe with three timeseries data columns. I want to add labels to the dataframe based on the binary values in one of the columns that increments. Below is a demonstration of the output that I desire ('Rate labels' built based on 'Rate pct change'). I made this using excel but want to do it using python.

            Rate pct change      Sharpe Ratio PCA   long_only_ew    Rate_labels
28/06/2019        -1               0.000000             0.024448    neg rates 1
01/07/2019         1               0.000000             0.030487    pos rates 1
02/07/2019        -1               0.000000             0.036835    neg rates 2
03/07/2019        -1               0.000000             0.054662    neg rates 2
05/07/2019        -1               0.000000             0.055340    neg rates 2
08/07/2019         1               0.000000             0.050585    pos rates 2
09/07/2019         1               0.000000             0.059735    pos rates 2
10/07/2019         1               0.000000             0.064335    pos rates 2
11/07/2019         1               0.000000             0.066124    pos rates 2
12/07/2019         1              -0.002202             0.072657    pos rates 2
15/07/2019        -1               0.003897             0.074136    neg rates 3
16/07/2019        -1               0.003278             0.071436    neg rates 3
17/07/2019         1               0.012141             0.072065    pos rates 3
18/07/2019         1               0.007214             0.074099    pos rates 3
19/07/2019         1               0.006617             0.073397    pos rates 3
22/07/2019         1               0.009760             0.078266    pos rates 3
23/07/2019         1               0.003645             0.075539    pos rates 3
24/07/2019         1               0.016116             0.085452    pos rates 3
25/07/2019         1               0.007491             0.075281    pos rates 3
26/07/2019         1               0.016323             0.090989    pos rates 3
29/07/2019         1               0.011050             0.077088    pos rates 3
30/07/2019         1               0.011531             0.073027    pos rates 3

Rate pct change (interest rates) was derived based on whether (rates were > 0 == 1) and (rates were <0 == -1)

How would I go about creating 'Rate_labels' using python?


Solution

  • Try using a combination of shift and cumsum:

    df['Rate_labels'] = df['Rate pct change'].lt(0).map({True:'neg', False:'pos'}) + ' rates ' + (df['Rate pct change'].shift(1).ne(df['Rate pct change']) & df['Rate pct change'].eq(-1)).cumsum().astype(str)
    

    Output:

    >>> df
                Rate pct change  Sharpe Ratio PCA  long_only_ew  Rate_labels
    28/06/2019               -1          0.000000      0.024448  neg rates 1
    01/07/2019                1          0.000000      0.030487  pos rates 1
    02/07/2019               -1          0.000000      0.036835  neg rates 2
    03/07/2019               -1          0.000000      0.054662  neg rates 2
    05/07/2019               -1          0.000000      0.055340  neg rates 2
    08/07/2019                1          0.000000      0.050585  pos rates 2
    09/07/2019                1          0.000000      0.059735  pos rates 2
    10/07/2019                1          0.000000      0.064335  pos rates 2
    11/07/2019                1          0.000000      0.066124  pos rates 2
    12/07/2019                1         -0.002202      0.072657  pos rates 2
    15/07/2019               -1          0.003897      0.074136  neg rates 3
    16/07/2019               -1          0.003278      0.071436  neg rates 3
    17/07/2019                1          0.012141      0.072065  pos rates 3
    18/07/2019                1          0.007214      0.074099  pos rates 3
    19/07/2019                1          0.006617      0.073397  pos rates 3
    22/07/2019                1          0.009760      0.078266  pos rates 3
    23/07/2019                1          0.003645      0.075539  pos rates 3
    24/07/2019                1          0.016116      0.085452  pos rates 3
    25/07/2019                1          0.007491      0.075281  pos rates 3
    26/07/2019                1          0.016323      0.090989  pos rates 3
    29/07/2019                1          0.011050      0.077088  pos rates 3
    30/07/2019                1          0.011531      0.073027  pos rates 3