Search code examples
time-seriesstatsmodels

Understanding period parameter in statsmodel.tsa.seasonal


So I am new to Time Series analysis, and want to check my data for seasonality and trend. I tried using both STL and seasonal_decompose from statsmodels.tsa.seasonal module. My question is regarding an input parameter, in both cases called period and described as:

Periodicity of the sequence

for STL case and

Period of the series

for seasonal_decompose.

However, for what I understood they are different. Based on this answer, the STL period parameter is defined as the expected seasonality of my series, for a daily periodicity for example (which is my case) it would be 365.

However, for seasonal_decompose I understand is the total number of samples regardless the time resolution. For example, if I have samples taken every hour, it would be 24 for my example case.

This was my conclusion based on the error I got when using seasonal_decompose with period=365 on a timeseries of 6 days, for which I got:

ValueError: x must have 2 complete cycles requires 730 observations. x only has 238 observation(s)

Are they indeed different? Did I correctly understood both cases? And if I understood correctly. Would this imply that seasonal_decompose cannot work for uneavenly spaced samples (in my case the samples are taken at a samewhat random date and time so the STL parameter adapts much better. Is there a workaround for seasonal_decompose on non-evenly distributed samples?


The more I read the less I understand. This code matches the sampling frequency to the period parameter. From the code docstring:

Annual maps to 1, quarterly maps to 4, monthly to 12, weekly to 52.

Then it seems a map from sampling frequency freq to in integer which means samples per year. So far so good until we see that it does:

elif freq == "D":
        return 7
elif freq == "H":
        return 24

So for a day it maps to a week frequency and for hours it maps to a day!

Please give me a hand here! I am compleately lost!


Solution

  • Ok, I think I finally undestood, period could be defined as:

    Expected samples in a full cycle / repetition of the seasonality component.

    Basically you can just look at your time-series and see the time it takes to repeat itself, and then get the number of samples within that timeframe.

    For the function that casts freq to period, it just "assumes" that if you have hourly frequency it will repeat it sequence again daily, if you have a frequency sampling of days it will repeat itself in a week and if it's more that that it will have a yearly seasonality.