Search code examples
pythonstatisticsdata-sciencestatsmodels

Seasonality is always 7 when running seasonal_decompose(). Why is that?


I have been running seasonal_decompose() from the statsmodels on about 20 totally different datasets. Is it standard that the seasonality is 7 when looking at a dataset with day frequency?

Here is a picture as an example of one dataset decomp. I zoomed in on the seasonality so that you can see that it is again 7 days:

enter image description here

Why is it always 7 days though? I wouldn't expect it to be always 7 days and the datasets are all different from each other, so by now I think that either this is total coincidence or this is because of seasonal_decompose().

But looking at how seasonal_decompose() in the statsmodels documentation , it uses LOESS to figure out the seasonality. If I look at the formula, it should be able to find different frequencies of the seasonality. I just need to verify that I am not wrong here: Is it pure coincidence that all of my datasets produce a 7 day frequency of the seasonality?


Solution

  • First of all, seasonal_decompose has nothing to do with LOESS, for decomposition based on LOESS you need to use statsmodels.tsa.seasonal.STL. seasonal_decompose does not infer periodicity based on data in any way. You only have two options:

    1. State periodicity explicitly using period argument
    2. Do not state periodicity, leaving period argument at None. In this case you have to feed pandas dataframe with datetime index to seasonal_decompose, and periodicity will be inferred from datetime index frequency label, otherwise it will throw an error. It first fetches frequency label: pfreq = getattr(getattr(x, "index", None), "inferred_freq", None) (in your case frequency label will be 'D', meaning daily), then it converts it to periodicity using statsmodels.tsa.tsatools.freq_to_period (in your case frequency label 'D' will be converted to 7, and that will be used as periodicity, hence the results you get)