Search code examples
pythonpandasnumpyrolling-computation

Pandas/numpy: rolling mean reset when row value is a multiple of it, with minimum size


I have a dataframe with multiple columns. For each column, I want to return an indicator (1 or -1), changing if the column's rolling mean resets when the current row value is a defined multiple of the rolling mean value. An added complication is that I don't want the reset to occur until the rolling window has reached a minimum length, following the previous reset.

Here's an example with multiple 2, minimum length 4, and a starting indicator value of 1

Starting dataframe:

df = pd.DataFrame(
    {
        "A": [0.1, 0.1, 0.15, 0.1, 0.1, 0.7, 0.1, 0.1, 0.5, 1, 0.1, 0.1],
        "B": [0.1, 0.1, 0.4, 0.1, 0.8, 0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.1],
    },
    index=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
)
print(df)

       A    B
0   0.10  0.1
1   0.10  0.1
2   0.15  0.4
3   0.10  0.1
4   0.10  0.8
5   0.70  0.1
6   0.10  0.1
7   0.10  0.1
8   0.50  0.1
9   1.00  0.1
10  0.10  0.9
11  0.10  0.1

I would like to obtain:

    A  B
0   1  1
1   1  1
2   1  1
3   1  1
4   1 -1
5  -1 -1
6  -1 -1
7  -1 -1
8  -1 -1
9   1 -1
10  1  1
11  1  1

Looking at column A for example:

  • first reset happens on row 5, as 0.7 >= 2 * mean(rows 0 to 5), and we have observed 6 rows
  • nothing happens on row 8 as the new rolling mean only has 3 observations
  • next reset happens on row 9, as 1 >= 2 * mean(rows 6 to 9), and we have observed 4 rows

I have looked at various posts, the closest being the below, but none seems to really address that problem fully:


Solution

  • You can try:

    def rolling_mean(col, start=1, multiple=2, min_count=4):
        curr = start
        num_obs = 0
        acc = 0
    
        for v in col:
            acc += v
            num_obs += 1
    
            if num_obs < min_count:
                yield curr
                continue
    
            m = acc / num_obs
            if v >= multiple * m:
                curr *= -1
                num_obs = 0
                acc = 0
    
            yield curr
    
    
    df["new_A"] = list(rolling_mean(df["A"]))
    df["new_B"] = list(rolling_mean(df["B"]))
    
    print(df)
    

    Prints:

           A    B  new_A  new_B
    0   0.10  0.1      1      1
    1   0.10  0.1      1      1
    2   0.15  0.4      1      1
    3   0.10  0.1      1      1
    4   0.10  0.8      1     -1
    5   0.70  0.1     -1     -1
    6   0.10  0.1     -1     -1
    7   0.10  0.1     -1     -1
    8   0.50  0.1     -1     -1
    9   1.00  0.1      1     -1
    10  0.10  0.9      1      1
    11  0.10  0.1      1      1
    

    You can speed-up the computation using :

    from numba import njit
    
    
    @njit
    def rolling_mean_numba(col, start=1, multiple=2, min_count=4):
        curr = start
        num_obs = 0
        acc = 0
    
        out = np.empty_like(col, dtype=np.int8)
    
        for i, v in enumerate(col):
            acc += v
            num_obs += 1
    
            if num_obs < min_count:
                out[i] = curr
                continue
    
            m = acc / num_obs
            if v >= multiple * m:
                curr *= -1
                num_obs = 0
                acc = 0
    
            out[i] = curr
    
        return out
    
    df["new_A"] = rolling_mean_numba(df["A"].values)
    df["new_B"] = rolling_mean_numba(df["B"].values)
    
    print(df)