Search code examples
pythonpandasseabornmoving-averagedata-science-experience

How to plot moving average by groupby in python?


I have a question related to plotting the moving average by doing group by. I have taken the dataset from Kaggle https://www.kaggle.com/code/kp4920/s-p-500-stock-data-time-series-analysis/comments. I have extracted the few rows by applying the below condition.

new_df_A = new_df[(new_df.Name == 'A')]
new_df_A.sort_values(by=['Name', 'Date'])

And I tried to calculate the moving average for 30 days by implementing this code

for cols in new_df_A.columns:
    if cols not in ['Name', 'Date',]:
        new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)

And I got this warning error

/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

And When I tried to plot the figure, it is blank. Can somebody help me with this?

Thank You


Solution

  • To get a moving average for time series data, the period of interest is specified differently: for 30 days, use '30D'. And since it is column-wise, we use loc to specify the column. Since it is already a single issue, groupby is not necessary. To create the graph, I used pandas visualization, which is the simplest way to do it.

    df_A = new_df_A.copy()
    df_A['Date'] = pd.to_datetime(df_A['Date'])
    df_A.set_index('Date', inplace=True)
    
    for cols in df_A.columns:
        if cols not in ['Name', 'Date',]:
            df_A['ma_'+cols] = df_A.loc[:,cols].rolling('30D').mean()
    
    df_A.iloc[:,6:10].plot()
    

    enter image description here

    import seaborn as sns
    sns.set(rc={'figure.figsize':(20,8)})
    
    for cols in df_A.columns:
        if cols not in ['Name', 'Date', 'Open', 'High', 'Close']:
            sns.lineplot(x=df_A.index, y=df_A[cols])
    
    #plt.show()
    

    enter image description here