Search code examples
pythonpandasvectorization

Vectorized calculation of new timeseries in pandas dataframe


I have a pandas dataframe and I am trying to estimate a new timeseries V(t) based on the values of an existing timeseries B(t). I have written a minimal reproducible example to generate a sample dataframe as follows:

import pandas as pd
import numpy as np

lenb = 5000
lenv = 200
l    = 5

B = pd.DataFrame({'a': np.arange(0, lenb, 1), 'b': np.arange(0, lenb, 1)},
                 index=pd.date_range('2022-01-01', periods=lenb, freq='2s'))

I want to calculate V(t) for all times 't' in the timeseries B as:

V(t) = (B(t-2*l) + 4*B(t-l)+ 6*B(t)+ 4*B(t+l)+ 1*B(t+2*l))/16

How can I perform this calculation in a vectorized manner in pandas? Lets say that l=5

Would that be the correct way to do it:

def V_t(B, l):
    V = (B.shift(-2*l) + 4*B.shift(-l) + 6*B + 4*B.shift(l) + B.shift(2*l)) / 16
    return V

Solution

  • I would have done it as you suggested in your latest edit. So here is an alternative to avoid having to type all the shift commands for an arbitrary long list of factors/multipliers:

    import numpy as np
    
    def V_t(B, l):
        X = [1, 4, 6, 4, 4]
        Y = [-2*l, -l, 0, l, 2*l]
        return pd.DataFrame(np.add.reduce([x*B.shift(y) for x, y in zip(X, Y)])/16,
                            index=B.index, columns=B.columns)