Search code examples
pythonpandascategoriesmeanrolling-computation

How can I create a rolling mean on a category instead of a float?


I have a column in my data frame with a category. I know how to do a rolling mean on a column with float values:

df['rolling_mean'] = df.categorycolumn.rolling(10).mean()

But I do not have numbers in this column but letters, for example:

A
A
A
A
D
D
D
D
D
S
D
D
D
D
D

And I want to get rid of the S, so I want to try to make a rolling average so it will be less sensitive for anomalies. Does anyone know how to do this?

Greetings and enjoy the sun (if there is any)!

p.s. Actually I am not looking for the mean but for the mode, the most common category value over a certain window.

So my data frame looks like this:

id category
1 A
2 A
3 A
4 A
5 A
6 A
7 S
8 S
9 A
10 A
11 A
12 A
13 A
14 A
15 A
17 A
18 A
19 A
20 A

And I wont it to find the mode of a rolling window like 10, so the full column would become only A (and no S)


Solution

  • Maybe you can use pd.factorize:

    # Detection
    s = pd.Series(pd.factorize(df['col1'])[0])
    m = s.sub(s.rolling(10, center=True, min_periods=1).median()).abs().ge(1)
    print(df[m])
    
    # Output
      col1
    9    S
    

    Update

    If you are looking for mode, try:

    value, letter = pd.factorize(df['category'])
    df['newcat'] = (pd.Series(value).rolling(10, center=True, min_periods=1)
                      .apply(lambda x: x.mode()).map(pd.Series(letter)))
    print(df)
    
    # Output
        id category newcat
    0    1        A      A
    1    2        A      A
    2    3        A      A
    3    4        A      A
    4    5        A      A
    5    6        A      A
    6    7        S      A  # HERE
    7    8        S      A  # HERE
    8    9        A      A
    9   10        A      A
    10  11        A      A
    11  12        A      A
    12  13        A      A
    13  14        A      A
    14  15        A      A
    15  17        A      A
    16  18        A      A
    17  19        A      A
    18  20        A      A