I have a dataframe with multiple columns. For each column, I want to return an indicator (1 or -1), changing if the column's rolling mean resets when the current row value is a defined multiple of the rolling mean value. An added complication is that I don't want the reset to occur until the rolling window has reached a minimum length, following the previous reset.
Here's an example with multiple 2, minimum length 4, and a starting indicator value of 1
Starting dataframe:
df = pd.DataFrame(
{
"A": [0.1, 0.1, 0.15, 0.1, 0.1, 0.7, 0.1, 0.1, 0.5, 1, 0.1, 0.1],
"B": [0.1, 0.1, 0.4, 0.1, 0.8, 0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.1],
},
index=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
)
print(df)
A B
0 0.10 0.1
1 0.10 0.1
2 0.15 0.4
3 0.10 0.1
4 0.10 0.8
5 0.70 0.1
6 0.10 0.1
7 0.10 0.1
8 0.50 0.1
9 1.00 0.1
10 0.10 0.9
11 0.10 0.1
I would like to obtain:
A B
0 1 1
1 1 1
2 1 1
3 1 1
4 1 -1
5 -1 -1
6 -1 -1
7 -1 -1
8 -1 -1
9 1 -1
10 1 1
11 1 1
Looking at column A for example:
I have looked at various posts, the closest being the below, but none seems to really address that problem fully:
You can try:
def rolling_mean(col, start=1, multiple=2, min_count=4):
curr = start
num_obs = 0
acc = 0
for v in col:
acc += v
num_obs += 1
if num_obs < min_count:
yield curr
continue
m = acc / num_obs
if v >= multiple * m:
curr *= -1
num_obs = 0
acc = 0
yield curr
df["new_A"] = list(rolling_mean(df["A"]))
df["new_B"] = list(rolling_mean(df["B"]))
print(df)
Prints:
A B new_A new_B
0 0.10 0.1 1 1
1 0.10 0.1 1 1
2 0.15 0.4 1 1
3 0.10 0.1 1 1
4 0.10 0.8 1 -1
5 0.70 0.1 -1 -1
6 0.10 0.1 -1 -1
7 0.10 0.1 -1 -1
8 0.50 0.1 -1 -1
9 1.00 0.1 1 -1
10 0.10 0.9 1 1
11 0.10 0.1 1 1
You can speed-up the computation using numba:
from numba import njit
@njit
def rolling_mean_numba(col, start=1, multiple=2, min_count=4):
curr = start
num_obs = 0
acc = 0
out = np.empty_like(col, dtype=np.int8)
for i, v in enumerate(col):
acc += v
num_obs += 1
if num_obs < min_count:
out[i] = curr
continue
m = acc / num_obs
if v >= multiple * m:
curr *= -1
num_obs = 0
acc = 0
out[i] = curr
return out
df["new_A"] = rolling_mean_numba(df["A"].values)
df["new_B"] = rolling_mean_numba(df["B"].values)
print(df)