Search code examples
pythonpandasdataframepandas-groupbycumsum

Dataframe cell to be locked and used for a running balance calculation (follow up question)


(This is a follow up question to my previous question which was answered correctly).

Say I have the following dataframe

import pandas as pd
df = pd.DataFrame()

df['E'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL','SHODL','SCLOSE_BUY','BCLOSE_SELL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT')

df['F'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 2.00,10.00,10.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00)

df.loc[19, 'G'] = 100.0000

With column G starting at 100 the same rules apply per my previous question whereby if a BUY or SELL occurs on col E the corresponding balance on column G is locked and continually used as the base amount to calculate a running balance with col F being the % increase/decrease for each row in the running balance until a BCLOSE or SCLOSE on col E is shown.

I have explained the rules in the previous question however new to this question is if a SCLOSE_BUY is shown the SELL is closed and a BUY is opened and vise versa for a BCLOSE_SELL. A BCLOSE, SCLOSE, SCLOSE_BUY or BCLOSE_SELL row all become the final row for the running balance calculation and will be used as the base when a BUY or SELL is shown next

FYI the a successful response to my previous question was made by Andy L. as follows however this response cannot handle the new scenario when a BCLOSE_SELL and a SCLOSE_BUY occur after one another

df1 = df[::-1]
s = df1.B.isin(['BCLOSE','SCLOSE']).shift(fill_value=False).cumsum()
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
    s = grp.C * 0.01 * init_val
    s.iloc[0] = init_val
    s = s.cumsum()
    init_val = s.iloc[-1]
    l.append(s)

The above answer does not address the problem I encounter in real life whereby instead of occuring a BCLOSE I instead receive a BCLOSE_SELL which basically turns the BUY into a SELL (ie. I close the BUY and open a SELL) which becomes the base amount for the ongoing rows.

If the rows continued as SHODL's I am able to adjust the code so that the running balance is correctly calculated however if I subsequently receive a SCLOSE_BUY (as seen in row 9 of my dataframe) I need to make this row close the SELL and reopen a BUY and this row will also be the new base amount for my running balance.

I understand this all sounds confusing as such the below column added to my above dataframe is what the result should look like.

df['G'] = (191.62,191.62,190.19,175.89,168.74,160.16,168.74,160.16,157.3,143,130,138,105,120,114,115,110,100,100,100)

Solution

  • I have a well-documented answer on a similar question posted here, however let me tweak it a little bit so that it can be applicable to the question you have just asked. Essentially, all you need to do is to add two new breakpoints at BCLOSE_SELL and SCLOSE_BUY in the following way:

    df.index[df[type_col].isin(['BCLOSE', 'SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])][::-1]
    

    In the above line the type_col is the name of the column that specifies the action (e.g. SHOLD or BCLOSE), or in your case the column E.

    You can find the complete and updated piece of code that works with both of your questions below:

    # basic setup
    type_col = 'E'  # the name of the action type column
    change_col = 'F'  # the name of the delta change column
    res_col = 'G'  # the name of the resulting column
    value = 100  # you can specify any initial value here
    PERCENTAGE_CONST = 100
    
    endpoints = [df.first_valid_index(), df.last_valid_index()]
    # occurrences of 'BCLOSE', 'SCLOSE', 'BCLOSE_SELL' and 'SCLOSE_BUY' that break the sequence
    breakpoints = df.index[df[type_col].isin(['BCLOSE','SCLOSE', 'BCLOSE_SELL', 'SCLOSE_BUY'])][::-1]
    # removes the endpoints of the dataframe that do not break the structure
    breakpoints = breakpoints.drop(endpoints, errors='ignore')
    
    for i in range(len(breakpoints) + 1):
    
        prv = breakpoints[i - 1] - 1 if i else -1  # previous or first breakpoint
        try:
            nex = breakpoints[i] - 1  # next breakpoint
        except IndexError:
            nex = None  # last breakpoint
    
        # cumulative sum of values adjusted for the percentage change appended to the resulting column
        res = value + (df[change_col][prv: nex: -1] * value / PERCENTAGE_CONST).cumsum()[::-1]
        df.loc[res.index, res_col] = res
    
        # saving the value that will be the basis for percentage calculations
        # for the next breakpoint
        value = res.iloc[0]
    
    

    The produced output is line with your expected result:

    >>> df
                  E     F       G
    0           SIT   0.0  191.62
    1        SCLOSE   1.0  191.62
    2         SHODL  10.0  190.19
    3         SHODL   5.0  175.89
    4         SHODL   6.0  168.74
    5         SHODL  -6.0  160.16
    6         SHODL   6.0  168.74
    7         SHODL   2.0  160.16
    8         SHODL  10.0  157.30
    9    SCLOSE_BUY  10.0  143.00
    10  BCLOSE_SELL  -8.0  130.00
    11        BHODL  33.0  138.00
    12        BHODL -15.0  105.00
    13        BHODL   6.0  120.00
    14        BHODL  -1.0  114.00
    15        BHODL   5.0  115.00
    16        BHODL  10.0  110.00
    17          BUY   0.0  100.00
    18          SIT   0.0  100.00
    19          SIT   0.0  100.00