I have a dataframe like this:
time_node rate_node
0 30 1.67
1 60 1.82
2 90 2.13
3 180 2.53
4 270 2.68
5 360 2.71
I want to use window to achieve such calculations:Starting from the second line, which is the line with index equal to 1, (time_node[index] * rate_node[index] - time_node[index - 1] * rate_node[index - 1])/(time[index] - time[index- 1]).
For instance,the second line should be calculated like this: (60 * 1.82 - 30 * 1.67)/(60 - 30); the third row: (90 * 2.13 - 60 * 1.82)/(90 - 60),and add a column of calculation results.
My code is as follows:
nodes = [30, 60, 90, 180, 270, 360]
rate = [1.67000, 1.82000, 2.13000, 2.53000, 2.68000, 2.71000]
df = pd.DataFrame({'time_node': nodes, 'rate_node': rate})
def forward_rate(rate, time):
denominator = rate[1] * time[1] - rate[0] * time[0]
numerator = time[1] - time[0]
return denominator/numertor
df.rolling(2).apply(forward_rate, {'rate': 'rate_node', 'time':
'time_node'})
I want to use rolling.apply to implement calculations,but it's raise an ERROR,I've been reading the documentation for a long time, but I still don't know where the code is wrong. Please help me.
I guess you're looking for something like this:
df['...'] = (df['rate_node']*df['time_node']).diff() / df['time_node'].diff()
with the following output on test data:
0 NaN
1 1.97
2 2.75
3 2.93
4 2.98
5 2.80
dtype: float64
Note, that rolling windows will loop over pairs along each column separately when aggregating data with the apply
method. The second positional variable {'rate': 'rate_node', 'time': 'time_node'}
will be interpreted as the raw
parameter, which should be boolean. That's why you're probably seeing ValueError: raw parameter must be True or False
.
See the description of Rolling.apply
One of the solutions in the general case can be producing data with an outer for-loop. For example:
name = 'new column'
df[name] = float('nan')
for index, window in zip(df.index, df.rolling(2)):
if len(window) == 2:
rate = window['rate_node'].values
time = window['time_node'].values
df.loc[index, name] = forward_rate(rate, time)
where forward_rate
is the original function. Note, that we need to pass a numpy ndarray as an arguments of forwar_rate
in order to make it work, because otherwise the original data indexes of passed data will be confused with their local positional indexing.
As alternative, we can try sliding windows from numpy:
from numpy.lib.stride_tricks import sliding_window_view
data = sliding_window_view(df[['rate_node', 'time_node']], window_shape=2, axis=0)
rate = data[:,0,:].T
time = data[:,1,:].T
df.loc[df.index[1:], 'new_column'] = forward_rate(rate, time)
where forward_rate
is the original function.
If Numba is installed, then we can use rolling
with method='table'
and apply
with engine='numba'
in order to perform the windowing operation over all columns at once. In this case, the aggregating function should work with 2D arrays and return the answer which can be broadcasted along the columns. At the end we get a new DataFrame with the same structure as the original one. So in case if the function returns a number, we have to restrict the answer to any column of the result. For example:
df['something_new'] = (
df[['rate_node','time_node']] # get columns in the right order
.rolling(2, method='table') # pass method='table' to work with all columns at once
.apply(lambda x: (x[1,1]*x[1,0]-x[0,1]*x[0,0])/(x[1,1]-x[0,1]),
raw=True, engine='numba') # x is indexed as a numpy.ndarray when engine='numba'
).iloc[:, 0] # use only one of the returned columns