I have a dataframe with three timeseries data columns. I want to add labels to the dataframe based on the binary values in one of the columns that increments. Below is a demonstration of the output that I desire ('Rate labels' built based on 'Rate pct change'). I made this using excel but want to do it using python.
Rate pct change Sharpe Ratio PCA long_only_ew Rate_labels
28/06/2019 -1 0.000000 0.024448 neg rates 1
01/07/2019 1 0.000000 0.030487 pos rates 1
02/07/2019 -1 0.000000 0.036835 neg rates 2
03/07/2019 -1 0.000000 0.054662 neg rates 2
05/07/2019 -1 0.000000 0.055340 neg rates 2
08/07/2019 1 0.000000 0.050585 pos rates 2
09/07/2019 1 0.000000 0.059735 pos rates 2
10/07/2019 1 0.000000 0.064335 pos rates 2
11/07/2019 1 0.000000 0.066124 pos rates 2
12/07/2019 1 -0.002202 0.072657 pos rates 2
15/07/2019 -1 0.003897 0.074136 neg rates 3
16/07/2019 -1 0.003278 0.071436 neg rates 3
17/07/2019 1 0.012141 0.072065 pos rates 3
18/07/2019 1 0.007214 0.074099 pos rates 3
19/07/2019 1 0.006617 0.073397 pos rates 3
22/07/2019 1 0.009760 0.078266 pos rates 3
23/07/2019 1 0.003645 0.075539 pos rates 3
24/07/2019 1 0.016116 0.085452 pos rates 3
25/07/2019 1 0.007491 0.075281 pos rates 3
26/07/2019 1 0.016323 0.090989 pos rates 3
29/07/2019 1 0.011050 0.077088 pos rates 3
30/07/2019 1 0.011531 0.073027 pos rates 3
Rate pct change (interest rates) was derived based on whether (rates were > 0 == 1) and (rates were <0 == -1)
How would I go about creating 'Rate_labels' using python?
Try using a combination of shift
and cumsum
:
df['Rate_labels'] = df['Rate pct change'].lt(0).map({True:'neg', False:'pos'}) + ' rates ' + (df['Rate pct change'].shift(1).ne(df['Rate pct change']) & df['Rate pct change'].eq(-1)).cumsum().astype(str)
Output:
>>> df
Rate pct change Sharpe Ratio PCA long_only_ew Rate_labels
28/06/2019 -1 0.000000 0.024448 neg rates 1
01/07/2019 1 0.000000 0.030487 pos rates 1
02/07/2019 -1 0.000000 0.036835 neg rates 2
03/07/2019 -1 0.000000 0.054662 neg rates 2
05/07/2019 -1 0.000000 0.055340 neg rates 2
08/07/2019 1 0.000000 0.050585 pos rates 2
09/07/2019 1 0.000000 0.059735 pos rates 2
10/07/2019 1 0.000000 0.064335 pos rates 2
11/07/2019 1 0.000000 0.066124 pos rates 2
12/07/2019 1 -0.002202 0.072657 pos rates 2
15/07/2019 -1 0.003897 0.074136 neg rates 3
16/07/2019 -1 0.003278 0.071436 neg rates 3
17/07/2019 1 0.012141 0.072065 pos rates 3
18/07/2019 1 0.007214 0.074099 pos rates 3
19/07/2019 1 0.006617 0.073397 pos rates 3
22/07/2019 1 0.009760 0.078266 pos rates 3
23/07/2019 1 0.003645 0.075539 pos rates 3
24/07/2019 1 0.016116 0.085452 pos rates 3
25/07/2019 1 0.007491 0.075281 pos rates 3
26/07/2019 1 0.016323 0.090989 pos rates 3
29/07/2019 1 0.011050 0.077088 pos rates 3
30/07/2019 1 0.011531 0.073027 pos rates 3