For some reason when I am trying to large amount of data to a sin wave it fails and fits it to a horizontal line. Can somebody explain?
Minimal working code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
# Seed the random number generator for reproducibility
import pandas
np.random.seed(0)
# Here it work as expected
# x_data = np.linspace(-5, 5, num=50)
# y_data = 2.9 * np.sin(1.05 * x_data + 2) + 250 + np.random.normal(size=50)
# With this data it breaks
x_data = np.linspace(0, 2500, num=2500)
y_data = -100 * np.sin(0.01 * x_data + 1) + 250 + np.random.normal(size=2500)
# And plot it
plt.figure(figsize=(6, 4))
plt.scatter(x_data, y_data)
def test_func(x, a, b, c, d):
return a * np.sin(b * x + c) + d
# Used to fit the correct function
# params, params_covariance = optimize.curve_fit(test_func, x_data, y_data)
# making some guesses
params, params_covariance = optimize.curve_fit(test_func, x_data, y_data,
p0=[-80, 3, 0, 260])
print(params)
plt.figure(figsize=(6, 4))
plt.scatter(x_data, y_data, label='Data')
plt.plot(x_data, test_func(x_data, *params),
label='Fitted function')
plt.legend(loc='best')
plt.show()
Does anybody know, how to fix this issue. Should I use a different fitting method not least square? Or should I reduce the number of data points?
Given your data, you can use the more robust lmfit
instead of scipy
.
In particular, you can use SineModel
(see here for details).
SineModel
in lmfit
is not for "shifted" sine waves, but you can easily deal with the shift doing
y_data_offset = y_data.mean()
y_transformed = y_data - y_data_offset
plt.scatter(x_data, y_transformed)
plt.axhline(0, color='r')
Now you can fit to sine wave
from lmfit.models import SineModel
mod = SineModel()
pars = mod.guess(y_transformed, x=x_data)
out = mod.fit(y_transformed, pars, x=x_data)
you can inspect results with print(out.fit_report())
and plot results with
plt.plot(x_data, y_data, lw=7, color='C1')
plt.plot(x_data, out.best_fit+y_data_offset, color='k')
# we add the offset ^^^^^^^^^^^^^
or with the builtin plot method out.plot_fit()
, see here for details.
Note that in SineModel
all parameters "are constrained to be non-negative", so your defined negative amplitude (-100) will be positive (+100) in the parameters fit results. So the phase too won't be 1 but π+1 (PS: they call shift
the phase)
print(out.best_values)
{'amplitude': 99.99631403054289,
'frequency': 0.010001193681616227,
'shift': 4.1400215410836605}