So I am new to Time Series analysis, and want to check my data for seasonality and trend.
I tried using both STL
and seasonal_decompose
from statsmodels.tsa.seasonal
module.
My question is regarding an input parameter, in both cases called period
and described as:
Periodicity of the sequence
for STL
case and
Period of the series
for seasonal_decompose
.
However, for what I understood they are different.
Based on this answer, the STL
period
parameter is defined as the expected seasonality of my series, for a daily periodicity for example (which is my case) it would be 365.
However, for seasonal_decompose
I understand is the total number of samples regardless the time resolution. For example, if I have samples taken every hour, it would be 24
for my example case.
This was my conclusion based on the error I got when using seasonal_decompose
with period=365
on a timeseries of 6 days, for which I got:
ValueError: x must have 2 complete cycles requires 730 observations. x only has 238 observation(s)
Are they indeed different? Did I correctly understood both cases? And if I understood correctly. Would this imply that seasonal_decompose
cannot work for uneavenly spaced samples (in my case the samples are taken at a samewhat random date and time so the STL
parameter adapts much better. Is there a workaround for seasonal_decompose
on non-evenly distributed samples?
The more I read the less I understand. This code matches the sampling frequency to the period parameter. From the code docstring:
Annual maps to 1, quarterly maps to 4, monthly to 12, weekly to 52.
Then it seems a map from sampling frequency freq
to in integer which means samples per year. So far so good until we see that it does:
elif freq == "D":
return 7
elif freq == "H":
return 24
So for a day it maps to a week frequency and for hours it maps to a day!
Please give me a hand here! I am compleately lost!
Ok, I think I finally undestood, period could be defined as:
Expected samples in a full cycle / repetition of the seasonality component.
Basically you can just look at your time-series and see the time it takes to repeat itself, and then get the number of samples within that timeframe.
For the function that casts freq
to period, it just "assumes" that if you have hourly frequency it will repeat it sequence again daily, if you have a frequency sampling of days it will repeat itself in a week and if it's more that that it will have a yearly seasonality.