Search code examples
pythonpandasmoving-average

moving average value exceeds the range of original column values


I have a dataframe like as shown below

score,region,supplier
27,AP,ABC
35,AP,ABC
29,ANZ,DEF
35,ANZ,DEF
30,KOREA,GHI
34,KOREA,GHI

I was working on feature engineering activities and found out that we have a quarterly score generated for each supplier based on their performance and this score ranges from 0 to 35.

Meaning, a supplier gets this score, 4 times every year (Q1,Q2,Q3 and Q4) based on their performance. 0 indicates poor performance and 35 indicates best performance.

What I did was I computed the moving average of these scores for a supplier based on their history (past data) till current time point.

However, when I do that I realize that the mov_avg_score goes beyond the range of 35.

Meaning, the mov_avg_score column contains values like 37.23, 52 etc?

Is this even possible? How can a derived score (computed using moving average(mean)) exceed its range of original score? shouldn't the moving average score be also on the same range?

In case, if you are interested to look at my code, please find below

df['mov_avg_score'] = df.groupby(['supplier','region'])['score'].expanding().mean().shift().values

Solution

  •  df['rolling_mean']=df.groupby(['supplier','region'],as_index=False,)['score'].rolling(2, min_periods=1).mean()['score']
        
    
       score region supplier  rolling_mean
    0     27     AP      ABC          27.0
    1     35     AP      ABC          31.0
    2     29    ANZ      DEF          29.0
    3     35    ANZ      DEF          32.0
    4     30  KOREA      GHI          30.0
    5     34  KOREA      GHI          32.0
    

    Following your comment, use size of largest group as window

    s=df.groupby(['supplier','region'])['score'].size().max()#compute window size
    

    #rolling mean

    df['rolling_mean']=df.groupby(['supplier','region'],as_index=False,)['score'].rolling(s,min_periods=1).mean()['score']