python pandas dataframe percentage calculated-columns

How to use changes in values in one dataframe column to calculate another

Lets say I have a dataframe. The first column is positive numbers, call it PRICE.

The values in the second column, call it MORELESS, can only be 1, or -1 and are calculated based on whether the value of PRICE in the current row is more than (1) or less than (-1) the value of price in the previous row.

This part I can code OK.

Now, the third column, called CHANGE is more difficult.

CHANGE doesnt necessarily have a value in every row, and is calculated only when MORELESS changes (from 1 to -1 or -1 to 1). It is the percentage change in VALUE between the current row and the last row where a CHANGE happened (ie wherever MORELESS changes 1/-1, -1/1). Heres an example I did by hand in excel:

INDEX	PRICE	MORELESS	CHANGE
1	148,25
2	143,53	-1	-3,18
3	139,94	-1
4	139,00	-1
5	140,31	1	-2,24
6	146,25	1
7	145,81	-1	3,92
8	144,59	-1
9	144,47	-1
10	146,53	1	0,49
11	145,34	-1	-0,81
12	145,31	-1
13	146,97	1	1,12
14	145,50	-1	-1,00
15	145,66	1	0,11
16	140,52	-1	-3,53
17	141,00	1	0,34
18	141,84	1
19	139,44	-1	-1,11
20	135,81	-1
21	139,75	1	0,22
22	141,28	1

Examples: CHANGE in row 11 uses PRICE in row 10: 100*(145.34-146.53)/146.53 CHANGE in row 21 uses PRICE in row 19: 100*(139.75-139.44)/139.44

What would be the best way to code the CHANGE column?

Thanks

Solution

Made a 'cs' column by which you can group. To get it, I compared the values of the 'MORELESS' column with its shifted value. Transformed to type int, giving 1, where True. Then calculated the cumulative sum.

In split I get the index of the next after row where 'cs' == 1 to use the slice when grouping. In 'aaa' I get the index of the ferst row with the previous 'cs'.

import pandas as pd
import numpy as np


df['cs'] = (df['MORELESS'] != df['MORELESS'].shift()).astype(int).cumsum()
split = df[df['cs'] == 1].index[-1] + 1
df['test'] = np.nan


def my_func(x):
    aaa = df[df['cs'] == x['cs'].values[0] - 1].index[0]
    df.loc[x.index[0], 'test'] = 100 * ((df.loc[x.index[0], 'PRICE'] - df.loc[aaa, 'PRICE']) / df.loc[aaa, 'PRICE'])


df[split:].groupby('cs').apply(my_func)

print(df)

Output

    INDEX   PRICE  MORELESS  CHANGE  cs      test
0       1  148.25       NaN     NaN   1       NaN
1       2  143.53      -1.0   -3.18   2 -3.183811
2       3  139.94      -1.0     NaN   2       NaN
3       4  139.00      -1.0     NaN   2       NaN
4       5  140.31       1.0   -2.24   3 -2.243433
5       6  146.25       1.0     NaN   3       NaN
6       7  145.81      -1.0    3.92   4  3.919892
7       8  144.59      -1.0     NaN   4       NaN
8       9  144.47      -1.0     NaN   4       NaN
9      10  146.53       1.0    0.49   5  0.493793
10     11  145.34      -1.0   -0.81   6 -0.812120
11     12  145.31      -1.0     NaN   6       NaN
12     13  146.97       1.0    1.12   7  1.121508
13     14  145.50      -1.0   -1.00   8 -1.000204
14     15  145.66       1.0    0.11   9  0.109966
15     16  140.52      -1.0   -3.53  10 -3.528766
16     17  141.00       1.0    0.34  11  0.341588
17     18  141.84       1.0     NaN  11       NaN
18     19  139.44      -1.0   -1.11  12 -1.106383
19     20  135.81      -1.0     NaN  12       NaN
20     21  139.75       1.0    0.22  13  0.222318
21     22  141.28       1.0     NaN  13       NaN