Search code examples
pythonnumpyscipycurve-fittingscipy-optimize

Scipy.optimize.curve_fit does not fit


Say I want to fit a sine function using scipy.optimize.curve_fit. I don't know any parameters of the function. To get the frequency, I do Fourier transform and guess all the other parameters - amplitude, phase, and offset. When running my program, I do get a fit but it does not make sense. What is the problem? Any help will be appreciated.

import numpy as np
import matplotlib.pyplot as plt
import scipy as sp

ampl = 1
freq = 24.5
phase = np.pi/2
offset = 0.05
t = np.arange(0,10,0.001)

func = np.sin(2*np.pi*t*freq + phase) + offset

fastfft = np.fft.fft(func)
freq_array = np.fft.fftfreq(len(t),t[0]-t[1])

max_value_index = np.argmax(abs(fastfft))
frequency = abs(freq_array[max_value_index])

def fit(a, f, p, o, t):
    return a * np.sin(2*np.pi*t*f + p) + o

guess = (0.9, frequency, np.pi/4, 0.1)
params, fit = sp.optimize.curve_fit(fit, t, func, p0=guess)

a, f, p, o = params
fitfunc = lambda t: a * np.sin(2*np.pi*t*f + p) + o

plt.plot(t, func, 'r-', t, fitfunc(t), 'b-')

Solution

  • The main problem in your program was a misunderstanding, how scipy.optimize.curve_fit is designed and its assumption of the fit function:

     ydata = f(xdata, *params) + eps
    

    This means that the fit function has to have the array for the x values as the first parameter followed by the function parameters in no particular order and must return an array for the y values. Here is an example, how to do this:

    import numpy as np
    import matplotlib.pyplot as plt
    import scipy.optimize 
    
    #t has to be the first parameter of the fit function 
    def fit(t, a, f, p, o):
        return a * np.sin(2*np.pi*t*f + p) + o
    
    ampl = 1
    freq = 2
    phase = np.pi/2
    offset = 0.5
    t = np.arange(0,10,0.01)
    
    #is the same as fit(t, ampl, freq, phase, offset)
    func = np.sin(2*np.pi*t*freq + phase) + offset
    
    fastfft = np.fft.fft(func)
    freq_array = np.fft.fftfreq(len(t),t[0]-t[1])
    
    max_value_index = np.argmax(abs(fastfft))
    frequency = abs(freq_array[max_value_index])
    
    guess = (0.9, frequency, np.pi/4, 0.1)
    #renamed the covariance matrix
    params, pcov = scipy.optimize.curve_fit(fit, t, func, p0=guess)
    a, f, p, o = params
    
    #calculate the fit plot using the fit function
    plt.plot(t, func, 'r-', t, fit(t, *params), 'b-')
    plt.show()
    

    As you can see, I have also changed the way the fit function for the plot is calculated. You don't need another function - just utilise the fit function with the parameter list, the fit procedure gives you back.
    The other problem was that you called the covariance array fit - overwriting the previously defined function fit. I fixed that as well.
    P.S.: Of course now you only see one curve, because the perfect fit covers your data points.