Search code examples
numpyscipystatisticstime-series

Polyfit for a seasonality model


I am trying to analyze the seasonality of the returns of a stock (but actually could be any kind of time series): enter image description here

On the x axis we have the weeks and on the y axis the historical average return during each week. To better clarify, each dot represents the average return (y axix) of the stock during each of the 52 weeks (x axis); the average takes into account the last 20 years. I'm trying to use a polynomial model to denoise the data and get a smoother signal.

I know I can get polynomial coefficients with numpy.polyfit

numpy.polyfit( weeks , return , deg)

The problem is that, in the example above, the signal I get for week 52 (red circle) is completely different from the signal I get for the following week (green circle, which is week 1 of the following year). I'm trying to avoid these kind of jumps from the signal of the last week of the December to the signal of the first week of January. Is there a way to force polyfit to find coefficients that produce the same function result for two given input x values (in my case, 1 and 52)?

Otherwise, is there anything I could do with the data to mitigate this problem? One thing I tried is adding "fake weeks" before the first one (so I created week -9 to 0, which have the same Y values of weeks 43 to 52) and other fake weeks after the last one (so we have week 53 to 62, which have the same Y values of weeks 1 to 10). This seems to help but doesn't completely fix the problem. Any ideas? Thanks


Solution

  • This is not a job for polyfit. Fundamentally your data represent a periodic process. One approach is to apply a real FFT, and then optionally limit the bandwidth. This will produce a spectral sequence that "knows" that Jan 1 and Dec 31+1 are the same thing. With a somewhat high bandwidth,

    import matplotlib.pyplot as plt
    import numpy as np
    
    
    ave_return = np.array([
        0.29549823, -0.04327911, -0.28475728,  0.24133149,  0.29175083,
        0.05927994, -0.19481259,  0.0682162 ,  0.12219757,  0.2537674 ,
        0.24648395,  0.15455555,  0.27520195, -0.01664706, -0.47437987,
       -0.01138717, -0.02216335,  0.0930811 ,  0.61556973,  0.30738668,
        0.30734683,  0.21362355,  0.13790445, -0.15041544, -0.37567391,
       -0.06940527, -0.12529933, -0.26046757, -0.34338869, -0.3451905 ,
       -0.02994229, -0.04620011, -0.03362213,  0.16813838,  0.20072505,
       -0.22111894, -0.23910233, -0.29322923, -0.06443125, -0.07527673,
       -0.25189341, -0.16183438, -0.07362219, -0.09708203,  0.00569532,
        0.23257541,  0.07938912,  0.03610597, -0.23765742, -0.32248603,
        0.04504569, -0.01805558,  0.03534886,
    ])
    
    spectrum = np.fft.rfft(ave_return)
    spectrum[30:] = 0
    
    verified = np.fft.irfft(spectrum)
    
    plt.scatter(np.arange(len(ave_return)), ave_return)
    plt.plot(verified)
    plt.show()
    

    high-bw

    Lowering that bandwidth from 30 to something like 6 makes it more obvious that the periodic sequence starts and ends at the same place: low-bw