Lets say I have a dataframe. The first column is positive numbers, call it PRICE.
The values in the second column, call it MORELESS, can only be 1, or -1 and are calculated based on whether the value of PRICE in the current row is more than (1) or less than (-1) the value of price in the previous row.
This part I can code OK.
Now, the third column, called CHANGE is more difficult.
CHANGE doesnt necessarily have a value in every row, and is calculated only when MORELESS changes (from 1 to -1 or -1 to 1). It is the percentage change in VALUE between the current row and the last row where a CHANGE happened (ie wherever MORELESS changes 1/-1, -1/1). Heres an example I did by hand in excel:
INDEX | PRICE | MORELESS | CHANGE |
---|---|---|---|
1 | 148,25 | ||
2 | 143,53 | -1 | -3,18 |
3 | 139,94 | -1 | |
4 | 139,00 | -1 | |
5 | 140,31 | 1 | -2,24 |
6 | 146,25 | 1 | |
7 | 145,81 | -1 | 3,92 |
8 | 144,59 | -1 | |
9 | 144,47 | -1 | |
10 | 146,53 | 1 | 0,49 |
11 | 145,34 | -1 | -0,81 |
12 | 145,31 | -1 | |
13 | 146,97 | 1 | 1,12 |
14 | 145,50 | -1 | -1,00 |
15 | 145,66 | 1 | 0,11 |
16 | 140,52 | -1 | -3,53 |
17 | 141,00 | 1 | 0,34 |
18 | 141,84 | 1 | |
19 | 139,44 | -1 | -1,11 |
20 | 135,81 | -1 | |
21 | 139,75 | 1 | 0,22 |
22 | 141,28 | 1 |
Examples: CHANGE in row 11 uses PRICE in row 10: 100*(145.34-146.53)/146.53 CHANGE in row 21 uses PRICE in row 19: 100*(139.75-139.44)/139.44
What would be the best way to code the CHANGE column?
Thanks
Made a 'cs'
column by which you can group.
To get it, I compared the values of the 'MORELESS'
column with its shifted value. Transformed
to type int, giving 1, where True. Then calculated the cumulative sum
.
In split
I get the index
of the next after row where 'cs' == 1
to use the slice when grouping. In 'aaa'
I get the index of the ferst row with the previous 'cs'.
import pandas as pd
import numpy as np
df['cs'] = (df['MORELESS'] != df['MORELESS'].shift()).astype(int).cumsum()
split = df[df['cs'] == 1].index[-1] + 1
df['test'] = np.nan
def my_func(x):
aaa = df[df['cs'] == x['cs'].values[0] - 1].index[0]
df.loc[x.index[0], 'test'] = 100 * ((df.loc[x.index[0], 'PRICE'] - df.loc[aaa, 'PRICE']) / df.loc[aaa, 'PRICE'])
df[split:].groupby('cs').apply(my_func)
print(df)
Output
INDEX PRICE MORELESS CHANGE cs test
0 1 148.25 NaN NaN 1 NaN
1 2 143.53 -1.0 -3.18 2 -3.183811
2 3 139.94 -1.0 NaN 2 NaN
3 4 139.00 -1.0 NaN 2 NaN
4 5 140.31 1.0 -2.24 3 -2.243433
5 6 146.25 1.0 NaN 3 NaN
6 7 145.81 -1.0 3.92 4 3.919892
7 8 144.59 -1.0 NaN 4 NaN
8 9 144.47 -1.0 NaN 4 NaN
9 10 146.53 1.0 0.49 5 0.493793
10 11 145.34 -1.0 -0.81 6 -0.812120
11 12 145.31 -1.0 NaN 6 NaN
12 13 146.97 1.0 1.12 7 1.121508
13 14 145.50 -1.0 -1.00 8 -1.000204
14 15 145.66 1.0 0.11 9 0.109966
15 16 140.52 -1.0 -3.53 10 -3.528766
16 17 141.00 1.0 0.34 11 0.341588
17 18 141.84 1.0 NaN 11 NaN
18 19 139.44 -1.0 -1.11 12 -1.106383
19 20 135.81 -1.0 NaN 12 NaN
20 21 139.75 1.0 0.22 13 0.222318
21 22 141.28 1.0 NaN 13 NaN