Search code examples
pythonpandasdataframepercentagecalculated-columns

How to use changes in values in one dataframe column to calculate another


Lets say I have a dataframe. The first column is positive numbers, call it PRICE.

The values in the second column, call it MORELESS, can only be 1, or -1 and are calculated based on whether the value of PRICE in the current row is more than (1) or less than (-1) the value of price in the previous row.

This part I can code OK.

Now, the third column, called CHANGE is more difficult.

CHANGE doesnt necessarily have a value in every row, and is calculated only when MORELESS changes (from 1 to -1 or -1 to 1). It is the percentage change in VALUE between the current row and the last row where a CHANGE happened (ie wherever MORELESS changes 1/-1, -1/1). Heres an example I did by hand in excel:

INDEX PRICE MORELESS CHANGE
1 148,25
2 143,53 -1 -3,18
3 139,94 -1
4 139,00 -1
5 140,31 1 -2,24
6 146,25 1
7 145,81 -1 3,92
8 144,59 -1
9 144,47 -1
10 146,53 1 0,49
11 145,34 -1 -0,81
12 145,31 -1
13 146,97 1 1,12
14 145,50 -1 -1,00
15 145,66 1 0,11
16 140,52 -1 -3,53
17 141,00 1 0,34
18 141,84 1
19 139,44 -1 -1,11
20 135,81 -1
21 139,75 1 0,22
22 141,28 1

Examples: CHANGE in row 11 uses PRICE in row 10: 100*(145.34-146.53)/146.53 CHANGE in row 21 uses PRICE in row 19: 100*(139.75-139.44)/139.44

What would be the best way to code the CHANGE column?

Thanks


Solution

  • Made a 'cs' column by which you can group. To get it, I compared the values of the 'MORELESS' column with its shifted value. Transformed to type int, giving 1, where True. Then calculated the cumulative sum.

    In split I get the index of the next after row where 'cs' == 1 to use the slice when grouping. In 'aaa' I get the index of the ferst row with the previous 'cs'.

    import pandas as pd
    import numpy as np
    
    
    df['cs'] = (df['MORELESS'] != df['MORELESS'].shift()).astype(int).cumsum()
    split = df[df['cs'] == 1].index[-1] + 1
    df['test'] = np.nan
    
    
    def my_func(x):
        aaa = df[df['cs'] == x['cs'].values[0] - 1].index[0]
        df.loc[x.index[0], 'test'] = 100 * ((df.loc[x.index[0], 'PRICE'] - df.loc[aaa, 'PRICE']) / df.loc[aaa, 'PRICE'])
    
    
    df[split:].groupby('cs').apply(my_func)
    
    print(df)
    

    Output

        INDEX   PRICE  MORELESS  CHANGE  cs      test
    0       1  148.25       NaN     NaN   1       NaN
    1       2  143.53      -1.0   -3.18   2 -3.183811
    2       3  139.94      -1.0     NaN   2       NaN
    3       4  139.00      -1.0     NaN   2       NaN
    4       5  140.31       1.0   -2.24   3 -2.243433
    5       6  146.25       1.0     NaN   3       NaN
    6       7  145.81      -1.0    3.92   4  3.919892
    7       8  144.59      -1.0     NaN   4       NaN
    8       9  144.47      -1.0     NaN   4       NaN
    9      10  146.53       1.0    0.49   5  0.493793
    10     11  145.34      -1.0   -0.81   6 -0.812120
    11     12  145.31      -1.0     NaN   6       NaN
    12     13  146.97       1.0    1.12   7  1.121508
    13     14  145.50      -1.0   -1.00   8 -1.000204
    14     15  145.66       1.0    0.11   9  0.109966
    15     16  140.52      -1.0   -3.53  10 -3.528766
    16     17  141.00       1.0    0.34  11  0.341588
    17     18  141.84       1.0     NaN  11       NaN
    18     19  139.44      -1.0   -1.11  12 -1.106383
    19     20  135.81      -1.0     NaN  12       NaN
    20     21  139.75       1.0    0.22  13  0.222318
    21     22  141.28       1.0     NaN  13       NaN