I am using statsmodels.graphics
to draw a month_plot
from timeseries data in a kaggle dataset. I have converted the data to daily frequency mean data as required for the plot. However, I am getting an error that says the expected data frequency is D, but the actual data frequency is M
where as my actual data is already D.
import pandas as pd
from statsmodels.graphics.tsaplots import month_plot
import matplotlib.pyplot as plt
df = pd.read_csv('/kaggle/input/hourly-energy-consumption/DOM_hourly.csv')
df.set_index('Datetime', inplace=True, drop=True)
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S')
# drop duplicated index
df = df[~df.index.duplicated(keep='first')]
# convert df to daily mean frequency dataframe
ddf = df.resample(rule='24H', kind='interval').mean().to_period('d')
# print example dataframe ddf
# Datetime
# 2005-05-01 7812.347826
# 2005-05-02 8608.083333
# ... ...
# 2017-12-30 14079.125000
# 2017-12-31 15872.833333
# Monthly plot from the Daily frequency data
Present output: As you can see above, my ddf
is clearly a daily frequency data. But I am getting following weird error saying my ddf
data is actually M (Monthly) but it expects D (Daily).
ValueError Traceback (most recent call last)
<ipython-input-7-675f2911920c> in <module>
8 plt.figure(figsize=(14,4))
----> 9 month_plot(ddf)
10 plt.show()
ValueError: Expected frequency D. Got M
You try to use month_plot
to plot daily data. If you want to use month_plot
, you have to resample your data first:
>>> help(month_plot)
x : array_like
Seasonal data to plot. If dates is None, x must be a pandas object
with a PeriodIndex or DatetimeIndex with a **monthly frequency**.