Search code examples
pythonscipycurve-fitting

How to fit those datas with harmonic functions?


I want to do a curve fit where I fit a curve as a sum of sinus and cosinus. But even just the fit for cosinus is completly wrong.

Here is my code: With those Datas from an nc File (opend with xarray):

ds_s_tagesgang['hour'] = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])

ds_s_tagesgang['T2m'] = array([-0.60313714, -0.6527668 , -0.669063 , -0.6045208 , -0.46157956, -0.36819172, -0.17480397, 0.00283241, 0.15954256, 0.33030534, 0.43397593, 0.54781055, 0.61995673, 0.59603477, 0.610795 , 0.5800109 , 0.4601419 , 0.29350758, 0.20055556, 0.03135109, -0.15563202, -0.27981472, -0.4027779 , -0.4945315 ], dtype=float32)

# fit a straight line to the economic data
from numpy import arange
from pandas import read_csv
from scipy.optimize import curve_fit
from matplotlib import pyplot
 
# define the true objective function
def objective(x, a, b, c):
    return a * np.sin(b * x) + c
 
# load the dataset

# choose the input and output variables
x, y = ds_s_tagesgang['hour'], ds_s_tagesgang['T2m']
# curve fit
popt, _ = curve_fit(objective, x, y)
# summarize the parameter values
a, b, c = popt
print('y = %.5f + np.sin(%.5f * x) + %.5f' % (a, b, c))
# plot input vs output
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 1)
# calculate the output for the range
y_line = objective(x_line, a, b, c)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
pyplot.show()

This is my code, but the fit is completly wrong like you see in the picture: The blue dots are my Datas. The red Line is the 'fitted' curve

enter image description here


Solution

  • The difficulty might come from the model equation of the form

    y = A * sin(B * x) + C

    which doesn't include a phase shift. The suggested model is :

    y = A * sin(B * x + F) + C

    involving four parameters (A,B,C,F) instead of three.

    Another cause of difficulty is the nonlinear regression method used in the software. This is an iterative calculus requiring initial guessed values of the parameters. If those values are too far from the correct ones the numerical calculus might fail.

    In order to overcome the difficulties one can use a not-iterative method as shown below.

    Note that the model function is written on a different form but equivalent :

    y = a + b * sin(w * x) + c * cos(w * x) = A * sin(B * x + F) + C

    A^2 = a^2+b^2

    tan(F) = b/c

    C = a

    enter image description here

    With your data the numerical calculus is :

    enter image description here

    NOTE : The above method involves a criteria of fitting different from the common least mean squares. If LMS fitting is strictly wanted one have to comeback to non-linear regression method (iterative). But there is no longer to guess initial values because the values found above are very good to be used as initial values.

    REFERENCE :

    https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales

    In addition : NON-LINEAR REGRESSION result

    enter image description here