Search code examples
pandasnumpyscipycurve-fittingscipy-optimize

scipy weird unexpected behavior curve_fit large data set for sin wave


For some reason when I am trying to large amount of data to a sin wave it fails and fits it to a horizontal line. Can somebody explain?

Minimal working code:

import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Seed the random number generator for reproducibility
import pandas

np.random.seed(0)

# Here it work as expected
# x_data = np.linspace(-5, 5, num=50)
# y_data = 2.9 * np.sin(1.05 * x_data + 2) + 250 + np.random.normal(size=50)

# With this data it breaks
x_data = np.linspace(0, 2500, num=2500)
y_data = -100 * np.sin(0.01 * x_data + 1) + 250 + np.random.normal(size=2500)

# And plot it

plt.figure(figsize=(6, 4))
plt.scatter(x_data, y_data)


def test_func(x, a, b, c, d):
    return a * np.sin(b * x + c) + d

# Used to fit the correct function
# params, params_covariance = optimize.curve_fit(test_func, x_data, y_data)

# making some guesses
params, params_covariance = optimize.curve_fit(test_func, x_data, y_data,
                                               p0=[-80, 3, 0, 260])

print(params)
plt.figure(figsize=(6, 4))
plt.scatter(x_data, y_data, label='Data')
plt.plot(x_data, test_func(x_data, *params),
         label='Fitted function')

plt.legend(loc='best')

plt.show()

This is the function with a lot of data and fitted incorrectly

This is the function with little data and fitted correctly

Does anybody know, how to fix this issue. Should I use a different fitting method not least square? Or should I reduce the number of data points?


Solution

  • Given your data, you can use the more robust lmfit instead of scipy.

    In particular, you can use SineModel (see here for details).

    SineModel in lmfit is not for "shifted" sine waves, but you can easily deal with the shift doing

    y_data_offset = y_data.mean()
    y_transformed = y_data - y_data_offset
    plt.scatter(x_data, y_transformed)
    plt.axhline(0, color='r')
    

    enter image description here

    Now you can fit to sine wave

    from lmfit.models import SineModel
    
    mod = SineModel()
    
    pars = mod.guess(y_transformed, x=x_data)
    out = mod.fit(y_transformed, pars, x=x_data)
    

    you can inspect results with print(out.fit_report()) and plot results with

    plt.plot(x_data, y_data, lw=7, color='C1')
    plt.plot(x_data, out.best_fit+y_data_offset, color='k')
    #           we add the offset ^^^^^^^^^^^^^
    

    enter image description here

    or with the builtin plot method out.plot_fit(), see here for details.

    Note that in SineModel all parameters "are constrained to be non-negative", so your defined negative amplitude (-100) will be positive (+100) in the parameters fit results. So the phase too won't be 1 but π+1 (PS: they call shift the phase)

    print(out.best_values)
    
    {'amplitude': 99.99631403054289,
     'frequency': 0.010001193681616227,
     'shift': 4.1400215410836605}