I have multiple sets of four data points and I want to create for each of these sets a linear trendline on a log-log plot between each of the points (four points meaning three trendlines in total). I use curve_fit
from scipy.optimize
to do this. It works for all sets that I have, except for one set:
Most of the time this has something to do with the initial guesses (the p0
), but after some trying with different guesses I still end up with this. I have seen in literature that for similar values these plots look just fine, so I must be missing something.
What am I missing here to make it work? The only thing I can think of is that there is still something wrong with the guesses. I have a test-code below to copy-paste.
The code:
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
# Two lists with the information for the line
list_x = [3.139, 2.53, 0.821, 0.27]
list_y = [35.56, 26.82, 10.42, 4.66]
def func_exp(x, a, b):
return (a * x)**b
# Points
point1_x = list_x[0]
point1_y = list_y[0]
point2_x = list_x[1]
point2_y = list_y[1]
point3_x = list_x[2]
point3_y = list_y[2]
point4_x = list_x[3]
point4_y = list_y[3]
# Lines between points
p0_12 = (point1_x, point2_x)
formula_12, pcov_12 = curve_fit(func_exp, [point1_x, point1_y], [point2_x, point2_y], maxfev=10000, p0=p0_12)
p0_23 = (point2_x, point3_x)
formula_23, pcov_23 = curve_fit(func_exp, [point2_x, point2_y], [point3_x, point3_y], maxfev=10000, p0=p0_23)
p0_34 = (point3_x, point4_x)
formula_34, pcov_34 = curve_fit(func_exp, [point3_x, point3_y], [point4_x, point4_y], maxfev=10000, p0=p0_34)
# Create plot
plot_x_12 = np.linspace(point1_x, point2_x, 1000)
plot_y_12 = (formula_12[0] * plot_x_12)**formula_12[1]
plot_x_23 = np.linspace(point2_x, point3_x, 1000)
plot_y_23 = (formula_23[0] * plot_x_23)**formula_23[1]
plot_x_34 = np.linspace(point3_x, point4_x, 1000)
plot_y_34 = (formula_34[0] * plot_x_34)**formula_34[1]
fig, ax1 = plt.subplots(1, 1, figsize=(10, 5))
ax1.scatter(list_x, list_y, color='black')
ax1.plot(plot_x_12, plot_y_12)
ax1.plot(plot_x_23, plot_y_23)
ax1.plot(plot_x_34, plot_y_34)
ax1.set_xscale('log', base=10)
ax1.set_yscale('log', base=10)
For your problem, I think it is good to check the output of the curve fit vs the actual y value once. As per your current formulation, they don't match at all:
func_exp(point1_x, formula_12[0], formula_12[1]), point1_y
# prints: (2.53, 35.56)
The mistake is on the function call for curve_fit. The second argument is supposed the x's and not the first point and so on...
So if you instead use the below formulation, you get a much cleaner looking plot:
p0_12 = (point1_x, point2_x)
formula_12, pcov_12 = curve_fit(func_exp, [point1_x, point2_x], [point1_y, point2_y], maxfev=10000, p0=p0_12)
p0_23 = (point2_x, point3_x)
formula_23, pcov_23 = curve_fit(func_exp, [point2_x, point3_x], [point2_y, point3_y], maxfev=10000, p0=p0_23)
p0_34 = (point3_x, point4_x)
formula_34, pcov_34 = curve_fit(func_exp, [point3_x, point4_x], [point3_y, point4_y], maxfev=10000, p0=p0_34)
Plot: