Search code examples
pythonmoving-averageeviews

Moving average in eviews and Python


I wanted to ask when doing moving average models in Time series trend analyze when we do moving average in eviews we do something like code below

moving average = @movavc(data, n)

However in python, we would do something like below:

data["mov_avc"] = data.rolling(window=n).mean()

When doing simple moving average in eviews we lose first but also LAST few observations, in python we would only lose first observations.

How is so?


Solution

  • If i got your question correctly, you want to understand why performing a moving average of window size n in python doesn't lose the last few points.

    Looking at the pandas.rolling() docs you see the note below:

    By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

    This means that the rolling window, by default, isn't centred on the value it is calculating the average for.

    Let's take a look at how this works with an example.

    We have a simple DataFrame:

    In [2]: ones_matrix = np.ones((5,1))
       ...: ones_matrix[:,0] = np.array([i+1 for i in range(ones_matrix.shape[0])])
       ...: index = [chr(ord('A')+i) for i in range(ones_matrix.shape[0])]
       ...: df = pd.DataFrame(data = ones_matrix,columns=['Value'],index=index)
       ...: df
    Out[2]:
       Value
    A    1.0
    B    2.0
    C    3.0
    D    4.0
    E    5.0
    

    Now let's roll window with size 3. (Notice that i explicitly wrote the argument center=False but that's the default value of calling df.rolling())

    In [3]: rolled_df = df.rolling(window=3,center=False).mean()
       ...: rolled_df
    Out[3]:
       Value
    A    NaN
    B    NaN
    C    2.0
    D    3.0
    E    4.0
    

    The first two rows are NaN while the last points remain there. If you notice for example at the row with index C it's value after rolling is 2. But before it was 3. This means that the new value for this index was the result of averaging the rows with indexes {A,B,C} whose values were respectively {1,2,3}.

    Therefore you can see the window wasn't centred on the index C when calculating the average for that position, it was instead centred on the index B.

    You can change that by setting centered=True, thus outputing the expected behaviour:

    In [4]: centred_rolled_df = df.rolling(window=3,center=True).mean()
       ...: centred_rolled_df
    Out[4]:
       Value
    A    NaN
    B    2.0
    C    3.0
    D    4.0
    E    NaN