Search code examples
pythonnumpymatplotlibscipy

Why curve_fit function not fit the data?


I'm trying to fit some data about a pendulum to find the time constant of decay of the pendulum. The curve_fit function doesn't seem to fit the data correctly.

Here is the data I used and my code:

Data link

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import chi2

filepath1 = '1.txt'


t1, x1, y1 = np.genfromtxt(filepath1, dtype=float, delimiter = ' ', skip_header = 2, unpack=True)



rad1 = np.arctan(x1/y1)


#t1, rad1 = t1[:1500], rad1[:1500]

error = 0.05*np.ones(len(t1))

def angle (t, theta_0, tau, T , phi):
    return theta_0 * np.exp(-t/tau) * np.cos(((2*np.pi)/T) * t + phi)

popt1, pcov1 = curve_fit(angle, t1, rad1,sigma = error, p0=(0.45,50,1.3,0.05), absolute_sigma=True, method='lm')


theta_0, tau, T , phi = popt1

print(tau,T)
plt.plot(t1,rad1)
plt.plot(t1,angle(t1,theta_0, tau, T , phi))

with a long signal

When I decrease the number of data that needs to be fit, somehow the cruve fit function fitted the data very good.

with a short signal


Solution

  • TL; DR

    The equation you are using for fitting is an approximation for small angles (<5°) where we can accept sin(x) = x. Simply select data where angles are respecting this constraint returns a valid fit.

    MCVE

    import numpy as np
    import pandas as pd
    from scipy import optimize
    from sklearn.metrics import r2_score
    

    We load and transform your data:

    data = pd.read_csv("https://pastebin.com/raw/Yszin3Zy", header=1, sep=" ")
    data = data.drop("Unnamed: 3", axis=1)
    data = data.set_index("t")
    data["x"] *= np.pi / 180.
    

    We select data where the approximation holds:

    data = data.loc[100:,:]
    

    We regress with the simplified model:

    def model(t, theta_0, tau, T , phi):
        return theta_0 * np.exp(- t / tau) * np.sin(((2 * np.pi) / T) * t + phi)
    
    popt, pcov = optimize.curve_fit(
        model, data.index, data.x, p0=(0.45, 30, 1.3, 0.05)
    )
    

    We infer the regressed curve:

    yhat = model(data.index, *popt)
    r2_score(data.x, yhat)  # 0.9915901662189318
    

    It looks like:

    enter image description here