Search code examples
pythonnumpymatplotlibcurve-fitting

How do I fit a sine curve to my data with pylab and numpy?


I am trying to show that economies follow a relatively sinusoidal growth pattern. I am building a python simulation to show that even when we let some degree of randomness take hold, we can still produce something relatively sinusoidal.

I am happy with the data I'm producing, but now I'd like to find some way to get a sine graph that pretty closely matches the data. I know you can do polynomial fit, but can you do sine fit?


Solution

  • You can use the least-square optimization function in scipy to fit any arbitrary function to another. In case of fitting a sin function, the 3 parameters to fit are the offset ('a'), amplitude ('b') and the phase ('c').

    As long as you provide a reasonable first guess of the parameters, the optimization should converge well.Fortunately for a sine function, first estimates of 2 of these are easy: the offset can be estimated by taking the mean of the data and the amplitude via the RMS (3*standard deviation/sqrt(2)).

    Note: as a later edit, frequency fitting has also been added. This does not work very well (can lead to extremely poor fits). Thus, use at your discretion, my advise would be to not use frequency fitting unless frequency error is smaller than a few percent.

    This leads to the following code:

    import numpy as np
    from scipy.optimize import leastsq
    import pylab as plt
    
    N = 1000 # number of data points
    t = np.linspace(0, 4*np.pi, N)
    f = 1.15247 # Optional!! Advised not to use
    data = 3.0*np.sin(f*t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise
    
    guess_mean = np.mean(data)
    guess_std = 3*np.std(data)/(2**0.5)/(2**0.5)
    guess_phase = 0
    guess_freq = 1
    guess_amp = 1
    
    # we'll use this to plot our first estimate. This might already be good enough for you
    data_first_guess = guess_std*np.sin(t+guess_phase) + guess_mean
    
    # Define the function to optimize, in this case, we want to minimize the difference
    # between the actual data and our "guessed" parameters
    optimize_func = lambda x: x[0]*np.sin(x[1]*t+x[2]) + x[3] - data
    est_amp, est_freq, est_phase, est_mean = leastsq(optimize_func, [guess_amp, guess_freq, guess_phase, guess_mean])[0]
    
    # recreate the fitted curve using the optimized parameters
    data_fit = est_amp*np.sin(est_freq*t+est_phase) + est_mean
    
    # recreate the fitted curve using the optimized parameters
    
    fine_t = np.arange(0,max(t),0.1)
    data_fit=est_amp*np.sin(est_freq*fine_t+est_phase)+est_mean
    
    plt.plot(t, data, '.')
    plt.plot(t, data_first_guess, label='first guess')
    plt.plot(fine_t, data_fit, label='after fitting')
    plt.legend()
    plt.show()
    

    enter image description here

    Edit: I assumed that you know the number of periods in the sine-wave. If you don't, it's somewhat trickier to fit. You can try and guess the number of periods by manual plotting and try and optimize it as your 6th parameter.