python scipy curve-fitting scipy-optimize

Why doesn't scipy.optimize.curve_fit fit a dampened sine curve?

Note: This is a follow-up of this question

User Nick ODell gave a great answer to the problem of curve_fit not fitting well to a normal curve --- as it turns out, the objective function with respect to the frequency is not "nice", and there are a lot of local minima that are nowhere near the global minima. To get around this, you should approximate the frequency first using an FFT and pass that in as your initial guess. This works, as it situates the optimizer near the global optima so it won't get "lost" somewhere else.

Now, I have another dataset, this time of a dampened sine curve. I tried using the same FFT trick to get a good curve fit, but all I get is a straight line.

My code is given in this pastebin (as you can see, I found that optimize.basinhopping works kind of well, but I was hoping for a better fit).

The code without the long list of data:

ydata=[float(_) for _ in ydata.strip().splitlines()]
 
 
from scipy.optimize import curve_fit as cf
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import basinhopping as bh
xdata=np.arange(0.05,500.01,0.05)
 
def damped_sin_fun(x,a,b,c,d,e):
    return a*(np.exp(-x/e))*np.sin(b*x+c)+d
 
fft = np.fft.fft(ydata - np.array(ydata).mean())
timestep = xdata[1] - xdata[0]
freq = np.fft.fftfreq(len(fft), d=timestep)
largest_component = np.abs(fft).argmax()
phase_guess = np.angle(fft[largest_component]) * freq[largest_component]
initial_frequency_guess = freq[largest_component] * 2 * np.pi
 
p_opt,p_cov=cf(damped_sin_fun,xdata,ydata, p0=[0.05, initial_frequency_guess, np.pi/2 ,np.mean(ydata),2.3])
 
print(initial_frequency_guess)
print(p_opt)
#print(((damped_sin_fun(xdata,*p_opt)-ydata)**2).mean())
optimize=bh(lambda args: ((damped_sin_fun(xdata,*args)-ydata)**2).mean(), [0.05, initial_frequency_guess, np.pi/2 ,np.mean(ydata),2.3]).x
 
 
plt.plot(xdata,damped_sin_fun(xdata,*p_opt))
plt.xlim((0,10))
#plt.plot(xdata,damped_sin_fun(xdata,*optimize))
#plt.plot(xdata,ydata, 'r.-', ms=1)
plt.show()

Solution

I would solve this problem by decomposing Y into three signals: an exponential term, a constant term, and a sine term.

y = exp * sine + constant

Then, I would fit each of the three separately before combining them.

In order to fit the exponential term, you can't just fit it to the signal, because you don't know what the sine parameters should be yet. However, what you can do, is pick an exponential term which causes the standard deviation to be constant along the signal. Once the signal has constant amplitude along its length, the sine wave can be fit using the procedure described in the previous question.

Before I jump in to the code, I want to call special attention to this part:

exp_trend_data = pd.Series(ydata).rolling(window=samples_per_window).std().bfill().values

Explanation: it's not possible to find the standard deviation at every point. What I did instead was to calculate the standard deviation in a rolling window, with a window size of 100. (This is in units of samples, not units of time.) For points before it has 100 samples to take the standard deviation of, it uses a backwards fill. This a somewhat arbitrary parameter, but I found that values between 20 and 1000 for samples_per_window worked fairly well.

Full code for fitting exponential term:

import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt


xdata = np.arange(0.05,500.01,0.05)
ydata = np.loadtxt('test579_data.txt')

samples_per_window = 100
exp_trend_data = pd.Series(ydata).rolling(window=samples_per_window).std().bfill().values

def exp_term(x, amplitude1, rate):
    return amplitude1 * np.exp(-x / rate)

beginning_value = exp_trend_data[:100].mean()
end_value = exp_trend_data[-100:].mean()
decrease_ratio = beginning_value / end_value
p0 = [beginning_value, np.max(xdata) / np.log(decrease_ratio)]
p_opt_exp, _ = curve_fit(exp_term, xdata, exp_trend_data, p0=p0)

plt.plot(exp_term(xdata, *p_opt_exp))
plt.plot(exp_trend_data)
plt.yscale('log')

Here is a graph of the exponential term (blue) versus the target (orange.)

(Note: the Y axis here is logarithmic. On a semi-log graph, any exponential term is a straight line. However, the target is not straight. This means that the decay in this problem isn't strictly exponential.)

Next, fit the constant term.

constant_term = ydata.mean()

Next, fit the sine term. By re-arranging the equation y = exp * sine + constant, we can get the sine term by itself.

sine_term_data = (ydata - constant_term) / exp_term(xdata, *p_opt_exp)
plt.plot(sine_term_data)

Plot:

This didn't completely remove the change in standard deviation, but it's close enough.

Next, fit the sine term.

def find_dominant_frequency(xdata, ydata):
    """Find dominiant frequency in units of radians"""
    fft = np.fft.fft(ydata - np.array(ydata).mean())
    timestep = xdata[1] - xdata[0]
    freq = np.fft.fftfreq(len(fft), d=timestep)
    fft = fft[:len(fft) // 2]
    freq = freq[:len(freq) // 2]
    largest_component = np.abs(fft).argmax()
    phase_guess = np.angle(fft[largest_component]) * freq[largest_component]
    initial_frequency_guess = freq[largest_component] * 2 * np.pi
    return initial_frequency_guess, phase_guess


def sine_term(x, amplitude2, freq, phase):
    return amplitude2*np.sin(freq*x+phase)


initial_frequency_guess, phase_guess = find_dominant_frequency(xdata, sine_term_data)
p0 = [np.sqrt(2), initial_frequency_guess, phase_guess]
p_opt_sine, _ = curve_fit(sine_term, xdata, sine_term_data, p0=p0)
plt.plot(xdata, sine_term_data)
plt.plot(xdata, sine_term(xdata, *p_opt_sine))
pos = 300
plt.xlim(pos, pos+20)

Next, all of the terms can be combined together.

def damped_sin_fun(x, amplitude, freq, phase, intercept, rate):
    return amplitude * np.exp(-x / rate) * np.sin(freq * x + phase) + intercept

amplitude1, rate = p_opt_exp
amplitude2, freq_guess, phase_guess = p_opt_sine
p0 = [
    amplitude1 * amplitude2,
    freq_guess,
    phase_guess,
    constant_term,
    rate,
]
plt.plot(xdata, damped_sin_fun(xdata, *p0))
plt.plot(xdata, ydata)

You can run another curve fit to tweak all of the parameters at once. This is pretty similar to the result from the independent solutions, but a little different.

p_opt, _ = curve_fit(damped_sin_fun, xdata, ydata, p0=p0)
print(p0)
print(p_opt.tolist())
plt.plot(xdata, ydata)
plt.plot(xdata, damped_sin_fun(xdata, *p_opt))
pos = 300
plt.xlim(pos, pos+20)

And that gives you a fit.