Search code examples
pythonpandasnumpyoptimizationsum

create a column that is the sum of previous X rows where x is a parm given by a different column row


Im trying to create a column where i sum the previous x rows of a column by a parm given in a different column row.

I have a solution but its really slow so i was wondering if anyone could help do this alot faster.

| time                     |    price   |parm |
|--------------------------|------------|-----|
|2020-11-04 00:00:00+00:00 |    1.17600 |   1 |
|2020-11-04 00:01:00+00:00 |    1.17503 |   2 |
|2020-11-04 00:02:00+00:00 |    1.17341 |   3 |
|2020-11-04 00:03:00+00:00 |    1.17352 |   2 |
|2020-11-04 00:04:00+00:00 |    1.17422 |   3 |

and the slow slow code


    @jit
    def rolling_sum(x,w):
        return np.convolve(x,np.ones(w,dtype=int),'valid')
    
    @jit
    def rol(x,y):
        for i in range(len(x)):
            res[i] = rolling_sum(x, y[i])[0]
        return res
    
    dfa = df[:500000]
    res = np.empty(len(dfa))
    
    r = rol(dfa.l_x.values, abs(dfa.mb).values+1)
    r


Solution

  • Maybe something like this could work. I have made up an example with to_be_summed being the column of the value that should be summed up and looback holding the number of rows to be looked back

    df = pd.DataFrame({"to_be_summed": range(10), "lookback":[0,1,2,3,2,1,4,2,1,2]})
    summed =  df.to_be_summed.cumsum()
    result = [summed[i] - summed[max(0,i - lookback - 1)] for i, lookback in enumerate(df.lookback)]
    

    What I did here is to first do a cumsum over the column that should be summed up. Now, for the i-th entry I can take the entry of this cumsum, and subtract the one i + 1 steps back. Note that this include the i-th value in the sum. If you don't want to inlcude it, you just have to change from summed[i] to summed[i - 1]. Also note that this part max(0,i - lookback - 1) will prevent you from accidentally looking back too many rows.