How to calculate rolling / moving average using python + NumPy / SciPy? discusses the situation when the observations are equally spaced, i.e., the index is equivalent to an integer range.
In my case, the observations come at arbitrary times and the interval between them can be an arbitrary float. E.g.,
import pandas as pd
import numpy as np
df = pd.DataFrame({"y":np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()
I want to add a column yavg
to df
whose value at a give index value x0
is
sum(df.y[x]*f(x0-x) for x in df.index) / sum(f(x0-x) for x in df.index)
for a given function f
, e.g.,
def f(x):
return np.exp(-x*x)
How do I do this with a minimal effort (preferably in pure numpy
)?
I think you can do something like this:
index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)
df['yavg'] = pd.Series(weighted_sum/entire_sum, index=df.index)
Basically:
index_np_arr
is the np.array
of all possible x0
values;entire_sum
would, get the denominator for all values in the index by repeating the vector n times, where n is the number of indexes and then subtracting for each x0
. Finally it would sum it all up;weighted_sum
would do almost the same thing except before we sum we would multiply by the y vector.Complete code:
import pandas as pd
import numpy as np
def f(x):
return np.exp(-x*x)
df = pd.DataFrame({"y":np.random.uniform(size=100)}, index=np.random.uniform(size=100)).sort_index()
index_np_arr = df.index.values
weighted_sum = np.sum(df['y'].values[:, np.newaxis] * f(index_np_arr - index_np_arr[:, np.newaxis]), axis=0)
entire_sum = np.sum(f(index_np_arr[:, np.newaxis] - index_np_arr), axis=0)
df['yavg'] = pd.Series(weighted_sum/entire_sum, index=df.index)
Note: This code does have a high memory usage because you will create an array of shape (n, n)
for computing the sums using vectorized functions, but is probably faster than iterating over all values of x
.