I have some data in two arrays, which appears to have a break in it. I want my code to figure out where the break is with using piecewise in scipy. Here is what I have:
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
x = np.array([7228,7620,7730,7901,8139,8370,8448,8737,8824,9089,9233,9321,9509,9568,9642,9756,9915,10601,10942], dtype=np.float)
y= np.array([.874,.893,.8905,.8916,.9095,.9142,.9109,.9185,.9169,.9251,.9290,.9304,.9467,.9378,0.9464,0.9508,0.9583,0.9857,0.9975],dtype=np.float)
def piecewise_linear(x, x0, y0, k1, k2):
return np.piecewise(x, [x < x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])
p , e = optimize.curve_fit(piecewise_linear, x, y)
perr = np.sqrt(np.diag(e))
xd = np.linspace(7228, 11000, 3000)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))
My issue is if I run this, I get an error, "OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)". Not sure how to get around this? Is there maybe a way to feed initial parameters into this function to help it converge or similar?
Note, I do realize that the other way I could be getting this to work is interpolating and finding the second derivative of my data. I've already done this, but because my data is not evenly spaced/ the y axis data has some error in it I am interested in getting it to work this way as well for statistical purposes. So, to be clear, what I want here are the parameters for the two lines (slope/intercept), and the inflection point. (Ideally I would love to get an error too on these too, but not sure if that's possible with this method.) Thanks in advance!
The code works perfectly fine, only the initial values are causing problems.
By default curve_fit
starts with all parameters set to 1. Thus, x0 starts way out of range of the x
in your data and the optimizer cannot compute a sensible gradient.
This small modification will fix the issue:
# make sure initial x0 and y0 are in range of the data
p0 = [np.mean(x), np.mean(y), 1, 1]
p , e = optimize.curve_fit(piecewise_linear, x, y, p0) # set initial parameter estimates
perr = np.sqrt(np.diag(e))
xd = np.linspace(7228, 11000, 3000)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))
print(p) # [ 9.32099947e+03 9.32965835e-01 2.58225121e-05 4.05400820e-05]
print(np.diag(e)) # [ 4.56978067e+04 5.52060368e-05 3.88418404e-12 7.05010755e-12]