Search code examples
pythonpandasdataframestatsmodelspandas-resample

Python StatsModels: ValueError: Expected frequency D. Got M


I am using statsmodels.graphics to draw a month_plot from timeseries data in a kaggle dataset. I have converted the data to daily frequency mean data as required for the plot. However, I am getting an error that says the expected data frequency is D, but the actual data frequency is M where as my actual data is already D.

import pandas as pd
from statsmodels.graphics.tsaplots import month_plot
import matplotlib.pyplot as plt

df = pd.read_csv('/kaggle/input/hourly-energy-consumption/DOM_hourly.csv')
df.set_index('Datetime', inplace=True, drop=True)
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S')
# drop duplicated index
df = df[~df.index.duplicated(keep='first')]

# convert df to daily mean frequency dataframe
ddf = df.resample(rule='24H', kind='interval').mean().to_period('d')

# print example dataframe ddf
#
#                 DOM_MW
# Datetime  
# 2005-05-01    7812.347826
# 2005-05-02    8608.083333
# ...         ...
# 2017-12-30    14079.125000
# 2017-12-31    15872.833333

# Monthly plot from the Daily frequency data
plt.figure(figsize=(14,4))
month_plot(ddf)
plt.show()

Present output: As you can see above, my ddf is clearly a daily frequency data. But I am getting following weird error saying my ddf data is actually M (Monthly) but it expects D (Daily).

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-675f2911920c> in <module>
      7 
      8 plt.figure(figsize=(14,4))
----> 9 month_plot(ddf)
     10 plt.show()

ValueError: Expected frequency D. Got M

Solution

  • You try to use month_plot to plot daily data. If you want to use month_plot, you have to resample your data first:

    >>> help(month_plot)
    ...
        x : array_like
            Seasonal data to plot. If dates is None, x must be a pandas object
            with a PeriodIndex or DatetimeIndex with a **monthly frequency**.
    ...
    
    month_plot(ddf.resample('M').mean())
    plt.show()
    

    Output:

    enter image description here