The cells highlighted in Green are the calculations that need to be done in pandas dataframe without the application for for loop.
i tried the below code. But it is giving wrong values in column "Val".
The Obtained output from the code is
DataFrame: df Par Val 0 50 50.0 1 60 57.0 2 70 67.0 3 80 77.0 4 90 87.0 5 100 97.0
Need help in getting the values as shown in image without loop application. [![enter image description here][2]][2]
[![enter image description here][2]][2]
df = pd.DataFrame()
df['Par'] = [50,60,70,80,90,100]
k = df['Par'].shift(1).fillna(df['Par'][0]).astype(float)
df['Val'] = (df['Par']*0.7) + (k*0.3)
print("DataFrame: df\n",df)`
Recursive calculations are not vectorisable, for improve performance is used numba:
from numba import jit
@jit(nopython=True)
def f(a):
d = np.empty(a.shape)
d[0] = a[0]
for i in range(1, a.shape[0]):
d[i] = d[i-1] * 0.3 + a[i] * 0.7
return d
df['Val'] = f(df['Par'].to_numpy())
print (df)
Par Val
0 50 50.0000
1 60 57.0000
2 70 66.1000
3 80 75.8300
4 90 85.7490
5 100 95.7247
Difference for performance for 1k rows:
from numba import jit
import itertools
np.random.seed(2022)
df = pd.DataFrame({'Par': np.random.randint(100, size=1000)})
In [64]: %%timeit
...:
...: df['Val1'] = f(df['Par'].to_numpy())
...:
...: import itertools
...:
...: df.loc[0,"Val"] = df.loc[0,"Par"]
...: for _ in itertools.repeat(None, len(df)):
...: df["Val"] = df["Val"].fillna((df["Par"]*0.7)+(df["Val"].shift(1)*(0.3)))
...:
1.05 s ± 193 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [65]: %%timeit
...: @jit(nopython=True)
...: def f(a):
...: d = np.empty(a.shape)
...: d[0] = a[0]
...: for i in range(1, a.shape[0]):
...: d[i] = d[i-1] * 0.3 + a[i] * 0.7
...: return d
...:
...: df['Val1'] = f(df['Par'].to_numpy())
...:
121 ms ± 3.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Test for 100krows:
np.random.seed(2022)
df = pd.DataFrame({'Par': np.random.randint(100, size=100000)})
In [70]: %%timeit
...:
...: df['Val1'] = f(df['Par'].to_numpy())
...:
...: import itertools
...:
...: df.loc[0,"Val"] = df.loc[0,"Par"]
...: for _ in itertools.repeat(None, len(df)):
...: df["Val"] = df["Val"].fillna((df["Par"]*0.7)+(df["Val"].shift(1)*(0.3)))
...:
4min 47s ± 5.39 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [71]: %%timeit
...: @jit(nopython=True)
...: def f(a):
...: d = np.empty(a.shape)
...: d[0] = a[0]
...: for i in range(1, a.shape[0]):
...: d[i] = d[i-1] * 0.3 + a[i] * 0.7
...: return d
...:
...: df['Val1'] = f(df['Par'].to_numpy())
...:
...:
129 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)